Search | VHL Regional Portal

A skin lesion hair mask dataset with fine-grained annotations.

Hossain, Sk Imran; Roy, Sudipta Singha; De Goër De Herve, Jocelyn; Mercer, Robert E; Mephu Nguifo, Engelbert.

Data Brief ; 48: 109249, 2023 Jun.

Article in English | MEDLINE | ID: mdl-37383821

ABSTRACT

Occlusion of skin lesions in dermoscopic images due to hair affects the performance of computer-assisted lesion analysis algorithms. Lesion analysis can benefit from digital hair removal or realistic hair simulation techniques. To assist in that process, we have created the largest publicly available skin lesion hair segmentation mask dataset by carefully annotating 500 dermoscopic images. Compared to the existing datasets, our dataset is free of non-hair artifacts like ruler markers, bubbles, and ink marks. The dataset is also less prone to over and under segmentations because of fine-grained annotations and quality checks from multiple independent annotators. To create the dataset, first, we collected five hundred copyright-free CC0 licensed dermoscopic images covering different hair patterns. Second, we trained a deep learning hair segmentation model on a publicly available weakly annotated dataset. Third, we extracted hair masks for the selected five hundred images using the segmentation model. Finally, we manually corrected all the segmentation errors and verified the annotations by superimposing the annotated masks on top of the dermoscopic images. Multiple annotators were involved in the annotation and verification process to make the annotations as error-free as possible. The prepared dataset will be useful for benchmarking and training hair segmentation algorithms as well as creating realistic hair augmentation systems.

Chemical identification and indexing in full-text articles: an overview of the NLM-Chem track at BioCreative VII.

Leaman, Robert; Islamaj, Rezarta; Adams, Virginia; Alliheedi, Mohammed A; Almeida, João Rafael; Antunes, Rui; Bevan, Robert; Chang, Yung-Chun; Erdengasileng, Arslan; Hodgskiss, Matthew; Ida, Ryuki; Kim, Hyunjae; Li, Keqiao; Mercer, Robert E; Mertová, Lukrécia; Mobasher, Ghadeer; Shin, Hoo-Chang; Sung, Mujeen; Tsujimura, Tomoki; Yeh, Wen-Chao; Lu, Zhiyong.

Database (Oxford) ; 20232023 03 07.

Article in English | MEDLINE | ID: mdl-36882099

ABSTRACT

The BioCreative National Library of Medicine (NLM)-Chem track calls for a community effort to fine-tune automated recognition of chemical names in the biomedical literature. Chemicals are one of the most searched biomedical entities in PubMed, and-as highlighted during the coronavirus disease 2019 pandemic-their identification may significantly advance research in multiple biomedical subfields. While previous community challenges focused on identifying chemical names mentioned in titles and abstracts, the full text contains valuable additional detail. We, therefore, organized the BioCreative NLM-Chem track as a community effort to address automated chemical entity recognition in full-text articles. The track consisted of two tasks: (i) chemical identification and (ii) chemical indexing. The chemical identification task required predicting all chemicals mentioned in recently published full-text articles, both span [i.e. named entity recognition (NER)] and normalization (i.e. entity linking), using Medical Subject Headings (MeSH). The chemical indexing task required identifying which chemicals reflect topics for each article and should therefore appear in the listing of MeSH terms for the document in the MEDLINE article indexing. This manuscript summarizes the BioCreative NLM-Chem track and post-challenge experiments. We received a total of 85 submissions from 17 teams worldwide. The highest performance achieved for the chemical identification task was 0.8672 F-score (0.8759 precision and 0.8587 recall) for strict NER performance and 0.8136 F-score (0.8621 precision and 0.7702 recall) for strict normalization performance. The highest performance achieved for the chemical indexing task was 0.6073 F-score (0.7417 precision and 0.5141 recall). This community challenge demonstrated that (i) the current substantial achievements in deep learning technologies can be utilized to improve automated prediction accuracy further and (ii) the chemical indexing task is substantially more challenging. We look forward to further developing biomedical text-mining methods to respond to the rapid growth of biomedical literature. The NLM-Chem track dataset and other challenge materials are publicly available at https://ftp.ncbi.nlm.nih.gov/pub/lu/BC7-NLM-Chem-track/. Database URL https://ftp.ncbi.nlm.nih.gov/pub/lu/BC7-NLM-Chem-track/.

Subject(s)

COVID-19 , United States , Humans , National Library of Medicine (U.S.) , Data Mining , Databases, Factual , MEDLINE

Identifying genotype-phenotype relationships in biomedical text.

Khordad, Maryam; Mercer, Robert E.

J Biomed Semantics ; 8(1): 57, 2017 Dec 06.

Article in English | MEDLINE | ID: mdl-29212530

ABSTRACT

BACKGROUND: One important type of information contained in biomedical research literature is the newly discovered relationships between phenotypes and genotypes. Because of the large quantity of literature, a reliable automatic system to identify this information for future curation is essential. Such a system provides important and up to date data for database construction and updating, and even text summarization. In this paper we present a machine learning method to identify these genotype-phenotype relationships. No large human-annotated corpus of genotype-phenotype relationships currently exists. So, a semi-automatic approach has been used to annotate a small labelled training set and a self-training method is proposed to annotate more sentences and enlarge the training set. RESULTS: The resulting machine-learned model was evaluated using a separate test set annotated by an expert. The results show that using only the small training set in a supervised learning method achieves good results (precision: 76.47, recall: 77.61, F-measure: 77.03) which are improved by applying a self-training method (precision: 77.70, recall: 77.84, F-measure: 77.77). CONCLUSIONS: Relationships between genotypes and phenotypes is biomedical information pivotal to the understanding of a patient's situation. Our proposed method is the first attempt to make a specialized system to identify genotype-phenotype relationships in biomedical literature. We achieve good results using a small training set. To improve the results other linguistic contexts need to be explored and an appropriately enlarged training set is required.

Subject(s)

Biological Ontologies , Genotype , Machine Learning , Phenotype , Biomedical Research , Databases, Factual

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL