Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 289
Filter
1.
PLoS One ; 17(6): e0270034, 2022.
Article in English | MEDLINE | ID: mdl-35771807

ABSTRACT

There remains a limited understanding of the HIV prevention and treatment needs among female sex workers in many parts of the world. Systematic reviews of existing literature can help fill this gap; however, well-done systematic reviews are time-demanding and labor-intensive. Here, we propose an automatic document classification approach to a systematic review to significantly reduce the effort in reviewing documents and optimizing empiric decision making. We first describe a manual document classification procedure that is used to curate a pertinent training dataset and then propose three classifiers: a keyword-guided method, a cluster analysis-based method, and a random forest approach that utilizes a large set of feature tokens. This approach is used to identify documents studying female sex workers that contain content relevant to either HIV or experienced violence. We compare the performance of the three classifiers by cross-validation in terms of area under the curve of the receiver operating characteristic and precision and recall plot, and found random forest approach reduces the amount of manual reading for our example by 80%; in sensitivity analysis, we found that even trained with only 10% of data, the classifier can still avoid reading 75% of future documents (68% of total) while retaining 80% of relevant documents. In sum, the automated procedure of document classification presented here could improve both the precision and efficiency of systematic reviews and facilitate live reviews, where reviews are updated regularly. We expect to obtain a reasonable classifier by taking 20% of retrieved documents as training samples. The proposed classifier could also be used for more meaningfully assembling literature in other research areas and for rapid documents screening with a tight schedule, such as COVID-related work during the crisis.


Subject(s)
COVID-19 , HIV Infections , Sex Workers , Systematic Reviews as Topic , Female , HIV Infections/diagnosis , HIV Infections/prevention & control , Humans , ROC Curve
2.
Nat Commun ; 12(1): 5392, 2021 09 13.
Article in English | MEDLINE | ID: mdl-34518529

ABSTRACT

Across a range of creative domains, individual careers are characterized by hot streaks, which are bursts of high-impact works clustered together in close succession. Yet it remains unclear if there are any regularities underlying the beginning of hot streaks. Here, we analyze career histories of artists, film directors, and scientists, and develop deep learning and network science methods to build high-dimensional representations of their creative outputs. We find that across all three domains, individuals tend to explore diverse styles or topics before their hot streak, but become notably more focused after the hot streak begins. Crucially, hot streaks appear to be associated with neither exploration nor exploitation behavior in isolation, but a particular sequence of exploration followed by exploitation, where the transition from exploration to exploitation closely traces the onset of a hot streak. Overall, these results may have implications for identifying and nurturing talents across a wide range of creative domains.

3.
J Youth Adolesc ; 50(11): 2236-2248, 2021 Nov.
Article in English | MEDLINE | ID: mdl-34417965

ABSTRACT

Youth of immigrant background are at risk of experiencing victimization due to their ethnic or cultural background. However, limited knowledge is available regarding why youth victimize their immigrant peers, and whether the factors associated with engagement in ethnic victimization vary across adolescents of different background. To address this gap in knowledge, the present study aimed to elucidate the common or differential factors associated with engagement in ethnic victimization among immigrant and native youth. The analytical sample included seventh grade students residing in Sweden from 55 classrooms (N = 963, Mage = 13.11, SD = 0.41; 46% girls; 38% youth of immigrant background). The results showed that being morally disengaged and engaging in general victimization are the common denominators of engagement in ethnic victimization for immigrant and Swedish youth. Low levels of positive attitudes toward immigrants provide a foundation for ethnic victimization among Swedish youth, but not youth of immigrant background. Classroom ethnic composition was not significantly related to engagement in ethnic victimization in either group. Predictors of engagement in ethnic victimization seem to have similarities and differences among immigrant and Swedish youth. The factors involved require further attention in developing strategies to combat bias-based hostile behaviors in diverse school settings.


Subject(s)
Bullying , Crime Victims , Emigrants and Immigrants , Adolescent , Ethnicity , Female , Humans , Male , Sweden
4.
Hum Comput Interact ; 36(2): 150-201, 2021.
Article in English | MEDLINE | ID: mdl-33867652

ABSTRACT

Digital experiences capture an increasingly large part of life, making them a preferred, if not required, method to describe and theorize about human behavior. Digital media also shape behavior by enabling people to switch between different content easily, and create unique threads of experiences that pass quickly through numerous information categories. Current methods of recording digital experiences provide only partial reconstructions of digital lives that weave - often within seconds - among multiple applications, locations, functions and media. We describe an end-to-end system for capturing and analyzing the "screenome" of life in media, i.e., the record of individual experiences represented as a sequence of screens that people view and interact with over time. The system includes software that collects screenshots, extracts text and images, and allows searching of a screenshot database. We discuss how the system can be used to elaborate current theories about psychological processing of technology, and suggest new theoretical questions that are enabled by multiple time scale analyses. Capabilities of the system are highlighted with eight research examples that analyze screens from adults who have generated data within the system. We end with a discussion of future uses, limitations, theory and privacy.

5.
Entropy (Basel) ; 23(1)2021 Jan 19.
Article in English | MEDLINE | ID: mdl-33478020

ABSTRACT

Recently, there has been a resurgence of formal language theory in deep learning research. However, most research focused on the more practical problems of attempting to represent symbolic knowledge by machine learning. In contrast, there has been limited research on exploring the fundamental connection between them. To obtain a better understanding of the internal structures of regular grammars and their corresponding complexity, we focus on categorizing regular grammars by using both theoretical analysis and empirical evidence. Specifically, motivated by the concentric ring representation, we relaxed the original order information and introduced an entropy metric for describing the complexity of different regular grammars. Based on the entropy metric, we categorized regular grammars into three disjoint subclasses: the polynomial, exponential and proportional classes. In addition, several classification theorems are provided for different representations of regular grammars. Our analysis was validated by examining the process of learning grammars with multiple recurrent neural networks. Our results show that as expected more complex grammars are generally more difficult to learn.

6.
Neural Comput ; 32(7): 1355-1378, 2020 07.
Article in English | MEDLINE | ID: mdl-32433903

ABSTRACT

Data samples collected for training machine learning models are typically assumed to be independent and identically distributed (i.i.d.). Recent research has demonstrated that this assumption can be problematic as it simplifies the manifold of structured data. This has motivated different research areas such as data poisoning, model improvement, and explanation of machine learning models. In this work, we study the influence of a sample on determining the intrinsic topological features of its underlying manifold. We propose the Shapley homology framework, which provides a quantitative metric for the influence of a sample of the homology of a simplicial complex. Our proposed framework consists of two main parts: homology analysis, where we compute the Betti number of the target topological space, and Shapley value calculation, where we decompose the topological features of a complex built from data points to individual points. By interpreting the influence as a probability measure, we further define an entropy that reflects the complexity of the data manifold. Furthermore, we provide a preliminary discussion of the connection of the Shapley homology to the Vapnik-Chervonenkis dimension. Empirical studies show that when the zero-dimensional Shapley homology is used on neighboring graphs, samples with higher influence scores have a greater impact on the accuracy of neural networks that determine graph connectivity and on several regular grammars whose higher entropy values imply greater difficulty in being learned.

7.
IEEE Trans Neural Netw Learn Syst ; 31(10): 4267-4278, 2020 10.
Article in English | MEDLINE | ID: mdl-31976910

ABSTRACT

Temporal models based on recurrent neural networks have proven to be quite powerful in a wide variety of applications, including language modeling and speech processing. However, training these models often relies on backpropagation through time (BPTT), which entails unfolding the network over many time steps, making the process of conducting credit assignment considerably more challenging. Furthermore, the nature of backpropagation itself does not permit the use of nondifferentiable activation functions and is inherently sequential, making parallelization of the underlying training process difficult. Here, we propose the parallel temporal neural coding network (P-TNCN), a biologically inspired model trained by the learning algorithm we call local representation alignment. It aims to resolve the difficulties and problems that plague recurrent networks trained by BPTT. The architecture requires neither unrolling in time nor the derivatives of its internal activation functions. We compare our model and learning procedure with other BPTT alternatives (which also tend to be computationally expensive), including real-time recurrent learning, echo state networks, and unbiased online recurrent optimization. We show that it outperforms these on-sequence modeling benchmarks such as Bouncing MNIST, a new benchmark we denote as Bouncing NotMNIST, and Penn Treebank. Notably, our approach can, in some instances, outperform full BPTT as well as variants such as sparse attentive backtracking. Significantly, the hidden unit correction phase of P-TNCN allows it to adapt to new data sets even if its synaptic weights are held fixed (zero-shot adaptation) and facilitates retention of prior generative knowledge when faced with a task sequence. We present results that show the P-TNCN's ability to conduct zero-shot adaptation and online continual sequence modeling.

8.
Front Res Metr Anal ; 5: 600382, 2020.
Article in English | MEDLINE | ID: mdl-33870061

ABSTRACT

Subject categories of scholarly papers generally refer to the knowledge domain(s) to which the papers belong, examples being computer science or physics. Subject category classification is a prerequisite for bibliometric studies, organizing scientific publications for domain knowledge extraction, and facilitating faceted searches for digital library search engines. Unfortunately, many academic papers do not have such information as part of their metadata. Most existing methods for solving this task focus on unsupervised learning that often relies on citation networks. However, a complete list of papers citing the current paper may not be readily available. In particular, new papers that have few or no citations cannot be classified using such methods. Here, we propose a deep attentive neural network (DANN) that classifies scholarly papers using only their abstracts. The network is trained using nine million abstracts from Web of Science (WoS). We also use the WoS schema that covers 104 subject categories. The proposed network consists of two bi-directional recurrent neural networks followed by an attention layer. We compare our model against baselines by varying the architecture and text representation. Our best model achieves micro- F 1 measure of 0.76 with F 1 of individual subject categories ranging from 0.50 to 0.95. The results showed the importance of retraining word embedding models to maximize the vocabulary overlap and the effectiveness of the attention mechanism. The combination of word vectors with TFIDF outperforms character and sentence level embedding models. We discuss imbalanced samples and overlapping categories and suggest possible strategies for mitigation. We also determine the subject category distribution in CiteSeerX by classifying a random sample of one million academic papers.

9.
Phys Rev B ; 1002019.
Article in English | MEDLINE | ID: mdl-33123651

ABSTRACT

The pressure evolution of the magnetic properties of the Ce2RhIn7.79Cd0.21 heavy fermion compound was investigated by single crystal neutron magnetic diffraction and electrical resistivity experiments under applied pressure. From the neutron magnetic diffraction data, up to P = 0.6 GPa, we found no changes in the magnetic structure or in the ordering temperature T N = 4.8 K. However, the increase of pressure induces an interesting spin rotation of the ordered antiferromagnetic moment of Ce2RhIn7.79Cd0.21 into the ab tetragonal plane. From the electrical resistivity measurements under pressure, we have mapped the evolution of T N and the maximum of the temperature dependent electrical resistivity (T MAX) as a function of the pressure (P ≲ 3.6 GPa). To gain some insight into the microscopic origin of the observed spin rotation as a function of pressure, we have also analyzed some macroscopic magnetic susceptibility data at ambient pressure for pure and Cd-doped Ce2RhIn8 using a mean-field model including tetragonal crystalline electric field (CEF). The analysis indicates that these compounds have a Kramers doublet Γ 7 - -type ground state, followed by a Γ 7 + first excited state at Δ1 ∼ 80 K and a Γ6 second excited state at Δ2 ∼ 270 K for Ce2RhIn8 and Δ2 ∼ 250 K for Ce2RhIn7.79Cd0.21. The evolution of the magnetic properties of Ce2RhIn8 as a function of Cd doping and the rotation of the direction of the ordered moment for the Ce2RhIn7.79Cd0.21 compound under pressure suggest important changes of the single ion anisotropy of Ce3+ induced by applying pressure and Cd doping in these systems. These changes are reflected in modifications in the CEF scheme that will ultimately affect the actual ground state of these compounds.

10.
Neural Comput ; 30(9): 2568-2591, 2018 09.
Article in English | MEDLINE | ID: mdl-30021081

ABSTRACT

Rule extraction from black box models is critical in domains that require model validation before implementation, as can be the case in credit scoring and medical diagnosis. Though already a challenging problem in statistical learning in general, the difficulty is even greater when highly nonlinear, recursive models, such as recurrent neural networks (RNNs), are fit to data. Here, we study the extraction of rules from second-order RNNs trained to recognize the Tomita grammars. We show that production rules can be stably extracted from trained RNNs and that in certain cases, the rules outperform the trained RNNs.

11.
Nature ; 559(7714): 396-399, 2018 07.
Article in English | MEDLINE | ID: mdl-29995850

ABSTRACT

The hot streak-loosely defined as 'winning begets more winnings'-highlights a specific period during which an individual's performance is substantially better than his or her typical performance. Although hot streaks have been widely debated in sports1,2, gambling3-5 and financial markets6,7 over the past several decades, little is known about whether they apply to individual careers. Here, building on rich literature on the lifecycle of creativity8-22, we collected large-scale career histories of individual artists, film directors and scientists, tracing the artworks, films and scientific publications they produced. We find that, across all three domains, hit works within a career show a high degree of temporal regularity, with each career being characterized by bursts of high-impact works occurring in sequence. We demonstrate that these observations can be explained by a simple hot-streak model, allowing us to probe quantitatively the hot streak phenomenon governing individual careers. We find this phenomemon to be remarkably universal across diverse domains: hot streaks are ubiquitous yet usually unique across different careers. The hot streak emerges randomly within an individual's sequence of works, is temporally localized, and is not associated with any detectable change in productivity. We show that, because works produced during hot streaks garner substantially more impact, the uncovered hot streaks fundamentally drive the collective impact of an individual, and ignoring this leads us to systematically overestimate or underestimate the future impact of a career. These results not only deepen our quantitative understanding of patterns that govern individual ingenuity and success, but also may have implications for identifying and nurturing individuals whose work will have lasting impact.


Subject(s)
Art , Culture , Efficiency , Motion Pictures/statistics & numerical data , Research Personnel/statistics & numerical data , Research/statistics & numerical data , Science , Task Performance and Analysis , Career Mobility , Creativity , Humans , Research Personnel/psychology , Social Change , Time Factors
12.
J Biomed Inform ; 68: 1-19, 2017 04.
Article in English | MEDLINE | ID: mdl-28213145

ABSTRACT

It is believed that anomalous mental states such as stress and anxiety not only cause suffering for the individuals, but also lead to tragedies in some extreme cases. The ability to predict the mental state of an individual at both current and future time periods could prove critical to healthcare practitioners. Currently, the practical way to predict an individual's mental state is through mental examinations that involve psychological experts performing the evaluations. However, such methods can be time and resource consuming, mitigating their broad applicability to a wide population. Furthermore, some individuals may also be unaware of their mental states or may feel uncomfortable to express themselves during the evaluations. Hence, their anomalous mental states could remain undetected for a prolonged period of time. The objective of this work is to demonstrate the ability of using advanced machine learning based approaches to generate mathematical models that predict current and future mental states of an individual. The problem of mental state prediction is transformed into the time series forecasting problem, where an individual is represented as a multivariate time series stream of monitored physical and behavioral attributes. A personalized mathematical model is then automatically generated to capture the dependencies among these attributes, which is used for prediction of mental states for each individual. In particular, we first illustrate the drawbacks of traditional multivariate time series forecasting methodologies such as vector autoregression. Then, we show that such issues could be mitigated by using machine learning regression techniques which are modified for capturing temporal dependencies in time series data. A case study using the data from 150 human participants illustrates that the proposed machine learning based forecasting methods are more suitable for high-dimensional psychological data than the traditional vector autoregressive model in terms of both magnitude of error and directional accuracy. These results not only present a successful usage of machine learning techniques in psychological studies, but also serve as a building block for multiple medical applications that could rely on an automated system to gauge individuals' mental states.


Subject(s)
Emotions , Machine Learning , Mental Health , Forecasting , Humans , Models, Theoretical
13.
Int J Obes (Lond) ; 41(6): 926-934, 2017 06.
Article in English | MEDLINE | ID: mdl-28239165

ABSTRACT

BACKGROUND: While vascular risk factors including Western-styled diet and obesity are reported to induce cognitive decline and increase dementia risk, recent reports consistently suggest that compromised integrity of cerebrovascular blood-brain barrier (BBB) may have an important role in neurodegeneration and cognitive deficits. A number of studies report that elevated blood pressure increases the permeability of BBB. METHODS: In this study, we investigated the effects of antihypertensive agents, candesartan or ursodeoxycholic acid (UDCA), on BBB dysfunction and cognitive decline in wild-type mice maintained on high fat and fructose (HFF) diet for 24 weeks. RESULTS: In HFF-fed mice, significantly increased body weight with elevated blood pressure, plasma insulin and glucose compared with mice fed with low-fat control chow was observed. Concomitantly, significant disruption of BBB and cognitive decline were evident in the HFF-fed obese mice. Hypertension was completely prevented by the coprovision of candesartan or UDCA in mice maintained on HFF diet, while only candesartan significantly reduced the body weight compared with HFF-fed mice. Nevertheless, BBB dysfunction and cognitive decline remained unaffected by candesartan or UDCA. CONCLUSIONS: These data conclusively indicate that modulation of blood pressure and/or body weight may not be directly associated with BBB dysfunction and cognitive deficits in Western diet-induced obese mice, and hence antihypertensive agents may not be effective in preventing BBB disruption and cognitive decline. The findings may provide important mechanistical insights to obesity-associated cognitive decline and its therapy.


Subject(s)
Antihypertensive Agents/pharmacology , Blood-Brain Barrier/drug effects , Cognition Disorders/physiopathology , Diet, High-Fat/adverse effects , Hypertension/physiopathology , Obesity/physiopathology , Animals , Cognition Disorders/blood , Disease Models, Animal , Hypertension/blood , Hypertension/drug therapy , Male , Mice , Mice, Obese , Obesity/blood , Obesity/drug therapy
14.
Neural Comput ; 29(4): 867-887, 2017 04.
Article in English | MEDLINE | ID: mdl-28095194

ABSTRACT

Many previous proposals for adversarial training of deep neural nets have included directly modifying the gradient, training on a mix of original and adversarial examples, using contractive penalties, and approximately optimizing constrained adversarial objective functions. In this article, we show that these proposals are actually all instances of optimizing a general, regularized objective we call DataGrad. Our proposed DataGrad framework, which can be viewed as a deep extension of the layerwise contractive autoencoder penalty, cleanly simplifies prior work and easily allows extensions such as adversarial training with multitask cues. In our experiments, we find that the deep gradient regularization of DataGrad (which also has L1 and L2 flavors of regularization) outperforms alternative forms of regularization, including classical L1, L2, and multitask, on both the original data set and adversarial sets. Furthermore, we find that combining multitask optimization with DataGrad adversarial training results in the most robust performance.

15.
J Microsc ; 264(3): 321-333, 2016 12.
Article in English | MEDLINE | ID: mdl-27439177

ABSTRACT

Semiquantitative immunofluorescence microscopy has become a key methodology in biomedical research. Typical statistical workflows are considered in the context of avoiding pseudo-replication and marginalising experimental error. However, immunofluorescence microscopy naturally generates hierarchically structured data that can be leveraged to improve statistical power and enrich biological interpretation. Herein, we describe a robust distribution fitting procedure and compare several statistical tests, outlining their potential advantages/disadvantages in the context of biological interpretation. Further, we describe tractable procedures for power analysis that incorporates the underlying distribution, sample size and number of images captured per sample. The procedures outlined have significant potential for increasing understanding of biological processes and decreasing both ethical and financial burden through experimental optimization.


Subject(s)
Biostatistics , Microscopy, Fluorescence/methods , Animals , Female , Humans , Likelihood Functions , Rats , Rats, Sprague-Dawley
16.
Int J Obes (Lond) ; 40(10): 1523-1528, 2016 10.
Article in English | MEDLINE | ID: mdl-27460603

ABSTRACT

BACKGROUND/OBJECTIVES: State-specific obesity prevalence data are critical to public health efforts to address the childhood obesity epidemic. However, few states administer objectively measured body mass index (BMI) surveillance programs. This study reports state-specific childhood obesity prevalence by age and sex correcting for parent-reported child height and weight bias. SUBJECTS/METHODS: As part of the Childhood Obesity Intervention Cost Effectiveness Study (CHOICES), we developed childhood obesity prevalence estimates for states for the period 2005-2010 using data from the 2010 US Census and American Community Survey (ACS), 2003-2004 and 2007-2008 National Survey of Children's Health (NSCH) (n=133 213), and 2005-2010 National Health and Nutrition Examination Surveys (NHANES) (n=9377; ages 2-17). Measured height and weight data from NHANES were used to correct parent-report bias in NSCH using a non-parametric statistical matching algorithm. Model estimates were validated against surveillance data from five states (AR, FL, MA, PA and TN) that conduct censuses of children across a range of grades. RESULTS: Parent-reported height and weight resulted in the largest overestimation of childhood obesity in males ages 2-5 years (NSCH: 42.36% vs NHANES: 11.44%). The CHOICES model estimates for this group (12.81%) and for all age and sex categories were not statistically different from NHANES. Our modeled obesity prevalence aligned closely with measured data from five validation states, with a 0.64 percentage point mean difference (range: 0.23-1.39) and a high correlation coefficient (r=0.96, P=0.009). Estimated state-specific childhood obesity prevalence ranged from 11.0 to 20.4%. CONCLUSION: Uncorrected estimates of childhood obesity prevalence from NSCH vary widely from measured national data, from a 278% overestimate among males aged 2-5 years to a 44% underestimate among females aged 14-17 years. This study demonstrates the validity of the CHOICES matching methods to correct the bias of parent-reported BMI data and highlights the need for public release of more recent data from the 2011 to 2012 NSCH.


Subject(s)
Pediatric Obesity/epidemiology , Public Health Surveillance , Public Health , Self Report/standards , Adolescent , Body Mass Index , Child , Child, Preschool , Female , Humans , Male , Nutrition Surveys , Parents , Pediatric Obesity/prevention & control , Policy Making , Prevalence , United States/epidemiology
17.
Adv Appl Microbiol ; 95: 1-67, 2016.
Article in English | MEDLINE | ID: mdl-27261781

ABSTRACT

A major challenge facing agriculture in the 21st century is the need to increase the productivity of cultivated land while reducing the environmentally harmful consequences of mineral fertilization. The microorganisms thriving in association and interacting with plant roots, the plant microbiota, represent a potential resource of plant probiotic function, capable of conjugating crop productivity with sustainable management in agroecosystems. However, a limited knowledge of the organismal interactions occurring at the root-soil interface is currently hampering the development and use of beneficial plant-microbiota interactions in agriculture. Therefore, a comprehensive understanding of the recruitment cues of the plant microbiota and the molecular basis of nutrient turnover in the rhizosphere will be required to move toward efficient and sustainable crop nutrition. In this chapter, we will discuss recent insights into plant-microbiota interactions at the root-soil interface, illustrate the processes driving mineral dynamics in soil, and propose experimental avenues to further integrate the metabolic potential of the plant microbiota into crop management and breeding strategies for sustainable agricultural production.


Subject(s)
Bacteria/metabolism , Microbiota , Minerals/metabolism , Plant Roots/microbiology , Plants/microbiology , Bacteria/classification , Bacteria/isolation & purification , Bacterial Physiological Phenomena , Minerals/analysis , Rhizosphere , Soil Microbiology
18.
J Cheminform ; 7(Suppl 1 Text mining for chemistry and the CHEMDNER track): S12, 2015.
Article in English | MEDLINE | ID: mdl-25810769

ABSTRACT

BACKGROUND: As we are witnessing a great interest in identifying and extracting chemical entities in academic articles, many approaches have been proposed to solve this problem. In this work we describe a probabilistic framework that allows for the output of multiple information extraction systems to be combined in a systematic way. The identified entities are assigned a probability score that reflects the extractors' confidence, without the need for each individual extractor to generate a probability score. We quantitively compared the performance of multiple chemical tokenizers to measure the effect of tokenization on extraction accuracy. Later, a single Conditional Random Fields (CRF) extractor that utilizes the best performing tokenizer is built using a unique collection of features such as word embeddings and Soundex codes, which, to the best of our knowledge, has not been explored in this context before. RESULTS: The ensemble of multiple extractors outperforms each extractor's individual performance during the CHEMDNER challenge. When the runs were optimized to favor recall, the ensemble approach achieved the second highest recall on unseen entities. As for the single CRF model with novel features, the extractor achieves an F1 score of 83.3% on the test set, without any post processing or abbreviation matching. CONCLUSIONS: Ensemble information extraction is effective when multiple stand alone extractors are to be used, and produces higher performance than individual off the shelf extractors. The novel features introduced in the single CRF model are sufficient to achieve very competitive F1 score using a simple standalone extractor.

19.
J Cheminform ; 7(Suppl 1 Text mining for chemistry and the CHEMDNER track): S2, 2015.
Article in English | MEDLINE | ID: mdl-25810773

ABSTRACT

The automatic extraction of chemical information from text requires the recognition of chemical entity mentions as one of its key steps. When developing supervised named entity recognition (NER) systems, the availability of a large, manually annotated text corpus is desirable. Furthermore, large corpora permit the robust evaluation and comparison of different approaches that detect chemicals in documents. We present the CHEMDNER corpus, a collection of 10,000 PubMed abstracts that contain a total of 84,355 chemical entity mentions labeled manually by expert chemistry literature curators, following annotation guidelines specifically defined for this task. The abstracts of the CHEMDNER corpus were selected to be representative for all major chemical disciplines. Each of the chemical entity mentions was manually labeled according to its structure-associated chemical entity mention (SACEM) class: abbreviation, family, formula, identifier, multiple, systematic and trivial. The difficulty and consistency of tagging chemicals in text was measured using an agreement study between annotators, obtaining a percentage agreement of 91. For a subset of the CHEMDNER corpus (the test set of 3,000 abstracts) we provide not only the Gold Standard manual annotations, but also mentions automatically detected by the 26 teams that participated in the BioCreative IV CHEMDNER chemical mention recognition task. In addition, we release the CHEMDNER silver standard corpus of automatically extracted mentions from 17,000 randomly selected PubMed abstracts. A version of the CHEMDNER corpus in the BioC format has been generated as well. We propose a standard for required minimum information about entity annotations for the construction of domain specific corpora on chemical and drug entities. The CHEMDNER corpus and annotation guidelines are available at: http://www.biocreative.org/resources/biocreative-iv/chemdner-corpus/.

20.
Equine Vet J ; 47(5): 510-8, 2015 Sep.
Article in English | MEDLINE | ID: mdl-24945608

ABSTRACT

For decades researchers have been targeting prevention of Rhodococcus equi (Rhodococcus hoagui/Prescottella equi) by vaccination and the horse breeding industry has supported the ongoing efforts by researchers to develop a safe and cost effective vaccine to prevent disease in foals. Traditional vaccines including live, killed and attenuated (physical and chemical) vaccines have proved to be ineffective and more modern molecular-based vaccines including the DNA plasmid, genetically attenuated and subunit vaccines have provided inadequate protection of foals. Newer, bacterial vector vaccines have recently shown promise for R. equi in the mouse model. This article describes the findings of key research in R. equi vaccine development and looks at alternative methods that may potentially be utilised.


Subject(s)
Actinomycetales Infections/veterinary , Bacterial Vaccines/immunology , Horse Diseases/prevention & control , Rhodococcus equi , Actinomycetales Infections/microbiology , Actinomycetales Infections/prevention & control , Animals , Horse Diseases/microbiology , Horses
SELECTION OF CITATIONS
SEARCH DETAIL
...