Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 17 de 17
Filter
1.
Mol Ecol ; 32(10): 2551-2564, 2023 05.
Article in English | MEDLINE | ID: mdl-36151926

ABSTRACT

The oral microbiota is a highly complex and diversified part of the human microbiome. Being located at the interface between the human body and the exterior environment, this microbiota can deepen our understanding of the environmental impacts on the global status of human health. This research topic has been well addressed in Westernized populations, but these populations only represent a fraction of human diversity. Papua New Guinea hosts very diverse environments and one of the most unique human biological diversities worldwide. In this study we performed the first known characterization of the oral microbiome in 85 Papua New Guinean individuals living in different environments, using a qualitative and quantitative approach. We found a significant geographical structure of the Papua New Guineans oral microbiome, especially in the groups most isolated from urban spaces. In comparison to other global populations, two bacterial genera related to iron absorption were significantly more abundant in Papua New Guineans and Aboriginal Australians, which suggests a shared oral microbiome signature. Further studies will be needed to confirm and explore this possible regional-specific oral microbiome profile.


Subject(s)
Microbiota , Mouth , Humans , Australia , Geography , Microbiota/genetics , Papua New Guinea , Mouth/microbiology
2.
Sci Rep ; 12(1): 10587, 2022 06 22.
Article in English | MEDLINE | ID: mdl-35732850

ABSTRACT

The timely identification of cohort participants at higher risk for attrition is important to earlier interventions and efficient use of research resources. Machine learning may have advantages over the conventional approaches to improve discrimination by analysing complex interactions among predictors. We developed predictive models of attrition applying a conventional regression model and different machine learning methods. A total of 542 very preterm (< 32 gestational weeks) infants born in Portugal as part of the European Effective Perinatal Intensive Care in Europe (EPICE) cohort were included. We tested a model with a fixed number of predictors (Baseline) and a second with a dynamic number of variables added from each follow-up (Incremental). Eight classification methods were applied: AdaBoost, Artificial Neural Networks, Functional Trees, J48, J48Consolidated, K-Nearest Neighbours, Random Forest and Logistic Regression. Performance was compared using AUC- PR (Area Under the Curve-Precision Recall), Accuracy, Sensitivity and F-measure. Attrition at the four follow-ups were, respectively: 16%, 25%, 13% and 17%. Both models demonstrated good predictive performance, AUC-PR ranging between 69 and 94.1 in Baseline and from 72.5 to 97.1 in Incremental model. Of the whole set of methods, Random Forest presented the best performance at all follow-ups [AUC-PR1: 94.1 (2.0); AUC-PR2: 91.2 (1.2); AUC-PR3: 97.1 (1.0); AUC-PR4: 96.5 (1.7)]. Logistic Regression performed well below Random Forest. The top-ranked predictors were common for both models in all follow-ups: birthweight, gestational age, maternal age, and length of hospital stay. Random Forest presented the highest capacity for prediction and provided interpretable predictors. Researchers involved in cohorts can benefit from our robust models to prepare for and prevent loss to follow-up by directing efforts toward individuals at higher risk.


Subject(s)
Infant, Premature , Machine Learning , Cohort Studies , Female , Fetal Growth Retardation , Humans , Infant, Newborn , Logistic Models , Neural Networks, Computer
3.
Microorganisms ; 9(2)2021 Jan 25.
Article in English | MEDLINE | ID: mdl-33503840

ABSTRACT

The continuous characterization of genome-wide diversity in population and case-cohort samples, allied to the development of new algorithms, are shedding light on host ancestry impact and selection events on various infectious diseases. Especially interesting are the long-standing associations between humans and certain bacteria, such as the case of Helicobacter pylori, which could have been strong drivers of adaptation leading to coevolution. Some evidence on admixed gastric cancer cohorts have been suggested as supporting Homo-Helicobacter coevolution, but reliable experimental data that control both the bacterium and the host ancestries are lacking. Here, we conducted the first in vitro coinfection assays with dual human- and bacterium-matched and -mismatched ancestries, in African and European backgrounds, to evaluate the genome wide gene expression host response to H. pylori. Our results showed that: (1) the host response to H. pylori infection was greatly shaped by the human ancestry, with variability on innate immune system and metabolism; (2) African human ancestry showed signs of coevolution with H. pylori while European ancestry appeared to be maladapted; and (3) mismatched ancestry did not seem to be an important differentiator of gene expression at the initial stages of infection as assayed here.

4.
J Biomed Inform ; 114: 103669, 2021 02.
Article in English | MEDLINE | ID: mdl-33359111

ABSTRACT

Over the last decades clinical research has been driven by informatics changes nourished by distinct research endeavors. Inherent to this evolution, several issues have been the focus of a variety of studies: multi-location patient data access, interoperability between terminological and classification systems and clinical practice and records harmonization. Having these problems in mind, the Data Safe Haven paradigm emerged to promote a newborn architecture, better reasoning and safe and easy access to distinct Clinical Data Repositories. This study aim is to present a novel solution for clinical search harmonization within a safe environment, making use of a hybrid coding taxonomy that enables researchers to collect information from multiple repositories based on a clinical domain query definition. Results show that is possible to query multiple repositories using a single query definition based on clinical domains and the capabilities of the Unified Medical Language System, although it leads to deterioration of the framework response times. Participants of a Focus Group and a System Usability Scale questionnaire rated the framework with a median value of 72.5, indicating the hybrid coding taxonomy could be enriched with additional metadata to further improve the refinement of the results and enable the possibility of using this system as data quality tagging mechanism.


Subject(s)
Metadata , Unified Medical Language System , Humans , Infant, Newborn
5.
BMC Med Inform Decis Mak ; 20(Suppl 5): 141, 2020 08 20.
Article in English | MEDLINE | ID: mdl-32819347

ABSTRACT

BACKGROUND: As of today, cancer is still one of the most prevalent and high-mortality diseases, summing more than 9 million deaths in 2018. This has motivated researchers to study the application of machine learning-based solutions for cancer detection to accelerate its diagnosis and help its prevention. Among several approaches, one is to automatically classify tumor samples through their gene expression analysis. METHODS: In this work, we aim to distinguish five different types of cancer through RNA-Seq datasets: thyroid, skin, stomach, breast, and lung. To do so, we have adopted a previously described methodology, with which we compare the performance of 3 different autoencoders (AEs) used as a deep neural network weight initialization technique. Our experiments consist in assessing two different approaches when training the classification model - fixing the weights after pre-training the AEs, or allowing fine-tuning of the entire network - and two different strategies for embedding the AEs into the classification network, namely by only importing the encoding layers, or by inserting the complete AE. We then study how varying the number of layers in the first strategy, the AEs latent vector dimension, and the imputation technique in the data preprocessing step impacts the network's overall classification performance. Finally, with the goal of assessing how well does this pipeline generalize, we apply the same methodology to two additional datasets that include features extracted from images of malaria thin blood smears, and breast masses cell nuclei. We also discard the possibility of overfitting by using held-out test sets in the images datasets. RESULTS: The methodology attained good overall results for both RNA-Seq and image extracted data. We outperformed the established baseline for all the considered datasets, achieving an average F1 score of 99.03, 89.95, and 98.84 and an MCC of 0.99, 0.84, and 0.98, for the RNA-Seq (when detecting thyroid cancer), the Malaria, and the Wisconsin Breast Cancer data, respectively. CONCLUSIONS: We observed that the approach of fine-tuning the weights of the top layers imported from the AE reached higher results, for all the presented experiences, and all the considered datasets. We outperformed all the previous reported results when comparing to the established baselines.


Subject(s)
Diagnostic Imaging , Early Detection of Cancer/methods , Machine Learning , Neoplasms/diagnosis , Neural Networks, Computer , Deep Learning , Gene Expression , Humans
6.
Microorganisms ; 8(8)2020 Aug 06.
Article in English | MEDLINE | ID: mdl-32781641

ABSTRACT

The human gastrointestinal tract harbors approximately 100 trillion microorganisms with different microbial compositions across geographic locations. In this work, we used RNASeq data from stomach samples of non-disease (164 individuals from European ancestry) and gastric cancer patients (137 from Europe and Asia) from public databases. Although these data were intended to characterize the human expression profiles, they allowed for a reliable inference of the microbiome composition, as confirmed from measures such as the genus coverage, richness and evenness. The microbiome diversity (weighted UniFrac distances) in gastric cancer mimics host diversity across the world, with European gastric microbiome profiles clustering together, distinct from Asian ones. Despite the confirmed loss of microbiome diversity from a healthy status to a cancer status, the structured profile was still recognized in the disease condition. In concordance with the parallel host-bacteria population structure, we found 16 human loci (non-synonymous variants) in the European-descendent cohorts that were significantly associated with specific genera abundance. These microbiome quantitative trait loci display heterogeneity between population groups, being mainly linked to the immune system or cellular features that may play a role in enabling microbe colonization and inflammation.

7.
Interdiscip Sci ; 11(1): 45-56, 2019 Mar.
Article in English | MEDLINE | ID: mdl-30707359

ABSTRACT

Protein-protein interaction (PPI) data is essential to elucidate the complex molecular relationships in living systems, and thus understand the biological functions at cellular and systems levels. The complete map of PPIs that can occur in a living organism is called the interactome. For animals, PPI data is stored in multiple databases (e.g., BioGRID, CCSB, DroID, FlyBase, HIPPIE, HitPredict, HomoMINT, INstruct, Interactome3D, mentha, MINT, and PINA2) with different formats. This makes PPI comparisons difficult to perform, especially between species, since orthologous proteins may have different names. Moreover, there is only a partial overlap between databases, even when considering a single species. The EvoPPI ( http://evoppi.i3s.up.pt ) web application presented in this paper allows comparison of data from the different databases at the species level, or between species using a BLAST approach. We show its usefulness by performing a comparative study of the interactome of the nine polyglutamine (polyQ) disease proteins, namely androgen receptor (AR), atrophin-1 (ATN1), ataxin 1 (ATXN1), ataxin 2 (ATXN2), ataxin 3 (ATXN3), ataxin 7 (ATXN7), calcium voltage-gated channel subunit alpha1 A (CACNA1A), Huntingtin (HTT), and TATA-binding protein (TBP). Here we show that none of the human interactors of these proteins is common to all nine interactomes. Only 15 proteins are common to at least 4 of these polyQ disease proteins, and 40% of these are involved in ubiquitin protein ligase-binding function. The results obtained in this study suggest that polyQ disease proteins are involved in different functional networks. Comparisons with Mus musculus PPIs are also made for AR and TBP, using EvoPPI BLAST search approach (a unique feature of EvoPPI), with the goal of understanding why there is a significant excess of common interactors for these proteins in humans.


Subject(s)
Neurodegenerative Diseases/metabolism , Peptides/metabolism , Protein Interaction Maps , Humans , Internet , Protein Binding
8.
Internet Interv ; 12: 176-180, 2018 Jun.
Article in English | MEDLINE | ID: mdl-30135781

ABSTRACT

INTRODUCTION: Clinical trials of blended Internet-based treatments deliver a wealth of data from various sources, such as self-report questionnaires, diagnostic interviews, treatment platform log files and Ecological Momentary Assessments (EMA). Mining these complex data for clinically relevant patterns is a daunting task for which no definitive best method exists. In this paper, we explore the expressive power of the multi-relational Inductive Logic Programming (ILP) data mining approach, using combined trial data of the EU E-COMPARED depression trial. METHODS: We explored the capability of ILP to handle and combine (implicit) multiple relationships in the E-COMPARED data. This data set has the following features that favor ILP analysis: 1) Time reasoning is involved; 2) there is a reasonable amount of explicit useful relations to be analyzed; 3) ILP is capable of building comprehensible models that might be perceived as putative explanations by domain experts; 4) both numerical and statistical models may coexist within ILP models if necessary. In our analyses, we focused on scores of the PHQ-8 self-report questionnaire (which taps depressive symptom severity), and on EMA of mood and various other clinically relevant factors. Both measures were administered during treatment, which lasted between 9 to 16 weeks. RESULTS: E-COMPARED trial data revealed different individual improvement patterns: PHQ-8 scores suggested that some individuals improved quickly during the first weeks of the treatment, while others improved at a (much) slower pace, or not at all. Combining self-reported Ecological Momentary Assessments (EMA), PHQ-8 scores and log data about the usage of the ICT4D platform in the context of blended care, we set out to unveil possible causes for these different trajectories. DISCUSSION: This work complements other studies into alternative data mining approaches to E-COMPARED trial data analysis, which are all aimed to identify clinically meaningful predictors of system use and treatment outcome. Strengths and limitations of the ILP approach given this objective will be discussed.

9.
Hum Mutat ; 36(11): 1100-11, 2015 Nov.
Article in English | MEDLINE | ID: mdl-26252938

ABSTRACT

A high-resolution mtDNA phylogenetic tree allowed us to look backward in time to investigate purifying selection. Purifying selection was very strong in the last 2,500 years, continuously eliminating pathogenic mutations back until the end of the Younger Dryas (∼11,000 years ago), when a large population expansion likely relaxed selection pressure. This was preceded by a phase of stable selection until another relaxation occurred in the out-of-Africa migration. Demography and selection are closely related: expansions led to relaxation of selection and higher pathogenicity mutations significantly decreased the growth of descendants. The only detectible positive selection was the recurrence of highly pathogenic nonsynonymous mutations (m.3394T>C-m.3397A>G-m.3398T>C) at interior branches of the tree, preventing the formation of a dinucleotide STR (TATATA) in the MT-ND1 gene. At the most recent time scale in 124 mother-children transmissions, purifying selection was detectable through the loss of mtDNA variants with high predicted pathogenicity. A few haplogroup-defining sites were also heteroplasmic, agreeing with a significant propensity in 349 positions in the phylogenetic tree to revert back to the ancestral variant. This nonrandom mutation property explains the observation of heteroplasmic mutations at some haplogroup-defining sites in sequencing datasets, which may not indicate poor quality as has been claimed.


Subject(s)
DNA, Mitochondrial , Genetics, Population , Mutation , Selection, Genetic , Alleles , Computational Biology/methods , Evolution, Molecular , Family , Female , Humans , Male , Phylogeny
10.
J Integr Bioinform ; 10(3): 231, 2013 Nov 14.
Article in English | MEDLINE | ID: mdl-24231145

ABSTRACT

Transposable Elements (TE) are sequences of DNA that move and transpose within a genome. TEs, as mutation agents, are quite important for their role in both genome alteration diseases and on species evolution. Several tools have been developed to discover and annotate TEs but no single tool achieves good results on all different types of TEs. In this paper we evaluate the performance of several TEs detection and annotation tools and investigate if Machine Learning techniques can be used to improve their overall detection accuracy. The results of an in silico evaluation of TEs detection and annotation tools indicate that their performance can be improved by using machine learning constructed classifiers.


Subject(s)
DNA Transposable Elements , Algorithms , Artificial Intelligence , Evolution, Molecular
11.
Int J Data Min Bioinform ; 6(6): 571-84, 2012.
Article in English | MEDLINE | ID: mdl-23356008

ABSTRACT

The functions of proteins in living organisms are related to their 3-D structure, which is known to be ultimately determined by their linear sequence of amino acids that together form these macromolecules. It is, therefore, of great importance to be able to understand and predict how the protein 3D-structure arises from a particular linear sequence of amino acids. In this paper we report the application of Machine Learning methods to predict, with high values of accuracy, the secondary structure of proteins, namely alpha-helices and beta-sheets, which are intermediate levels of the local structure.


Subject(s)
Protein Structure, Secondary , Proteins/chemistry , Algorithms , Amino Acids/chemistry , Artificial Intelligence , Databases, Factual
12.
J Integr Bioinform ; 8(3): 182, 2011 Sep 16.
Article in English | MEDLINE | ID: mdl-21926445

ABSTRACT

It has been recognized that the development of new therapeutic drugs is a complex and expensive process. A large number of factors affect the activity in vivo of putative candidate molecules and the propensity for causing adverse and toxic effects is recognized as one of the major hurdles behind the current "target-rich, lead-poor" scenario. Structure-Activity Relationship (SAR) studies, using relational Machine Learning (ML) algorithms, have already been shown to be very useful in the complex process of rational drug design. Despite the ML successes, human expertise is still of the utmost importance in the drug development process. An iterative process and tight integration between the models developed by ML algorithms and the know-how of medicinal chemistry experts would be a very useful symbiotic approach. In this paper we describe a software tool that achieves that goal--iLogCHEM. The tool allows the use of Relational Learners in the task of identifying molecules or molecular fragments with potential to produce toxic effects, and thus help in stream-lining drug design in silico. It also allows the expert to guide the search for useful molecules without the need to know the details of the algorithms used. The models produced by the algorithms may be visualized using a graphical interface, that is of common use amongst researchers in structural biology and medicinal chemistry. The graphical interface enables the expert to provide feedback to the learning system. The developed tool has also facilities to handle the similarity bias typical of large chemical databases. For that purpose the user can filter out similar compounds when assembling a data set. Additionally, we propose ways of providing background knowledge for Relational Learners using the results of Graph Mining algorithms.


Subject(s)
Algorithms , Artificial Intelligence , Drug Design , Drug-Related Side Effects and Adverse Reactions , Pharmaceutical Preparations/chemistry , Animals , Humans , Structure-Activity Relationship
13.
Int J Legal Med ; 125(5): 629-36, 2011 Sep.
Article in English | MEDLINE | ID: mdl-20552217

ABSTRACT

Because of their sensitivity and high level of discrimination, short tandem repeat (STR) maker systems are currently the method of choice in routine forensic casework and data banking, usually in multiplexes up to 15-17 loci. Constraints related to sample amount and quality, frequently encountered in forensic casework, will not allow to change this picture in the near future, notwithstanding the technological developments. In this study, we present a free online calculator named PopAffiliator ( http://cracs.fc.up.pt/popaffiliator ) for individual population affiliation in the three main population groups, Eurasian, East Asian and sub-Saharan African, based on genotype profiles for the common set of STRs used in forensics. This calculator performs affiliation based on a model constructed using machine learning techniques. The model was constructed using a data set of approximately fifteen thousand individuals collected for this work. The accuracy of individual population affiliation is approximately 86%, showing that the common set of STRs routinely used in forensics provide a considerable amount of information for population assignment, in addition to being excellent for individual identification.


Subject(s)
Computers/legislation & jurisprudence , Forensic Genetics/instrumentation , Forensic Genetics/legislation & jurisprudence , Genetic Markers/genetics , Genetics, Population/legislation & jurisprudence , Genotype , Microsatellite Repeats/genetics , Population Groups/genetics , Artificial Intelligence , Gene Frequency/genetics , Humans
14.
J Theor Biol ; 271(1): 136-44, 2011 Feb 21.
Article in English | MEDLINE | ID: mdl-21130100

ABSTRACT

A statistical approach has been applied to analyse primary structure patterns at inner positions of α-helices in proteins. A systematic survey was carried out in a recent sample of non-redundant proteins selected from the Protein Data Bank, which were used to analyse α-helix structures for amino acid pairing patterns. Only residues more than three positions apart from both termini of the α-helix were considered as inner. Amino acid pairings i, i+k (k=1, 2, 3, 4, 5), were analysed and the corresponding 20×20 matrices of relative global propensities were constructed. An analysis of (i, i+4, i+8) and (i, i+3, i+4) triplet patterns was also performed. These analysis yielded information on a series of amino acid patterns (pairings and triplets) showing either high or low preference for α-helical motifs and suggested a novel approach to protein alphabet reduction. In addition, it has been shown that the individual amino acid propensities are not enough to define the statistical distribution of these patterns. Global pair propensities also depend on the type of pattern, its composition and orientation in the protein sequence. The data presented should prove useful to obtain and refine useful predictive rules which can further the development and fine-tuning of protein structure prediction algorithms and tools.


Subject(s)
Amino Acids/chemistry , Protein Structure, Secondary , Proteins/chemistry , Algorithms , Databases, Protein , Protein Folding
16.
Proteins ; 70(1): 188-96, 2008 Jan 01.
Article in English | MEDLINE | ID: mdl-17654550

ABSTRACT

A systematic survey was carried out in an unbiased sample of 815 protein chains with a maximum of 20% homology selected from the Protein Data Bank, whose structures were solved at a resolution higher than 1.6 A and with a R-factor lower than 25%. A set of 5556 subsequences with alpha-helix or 3(10)-helix motifs was extracted from the protein chains considered. Global and local propensities were then calculated for all possible amino acid pairs of the type (i, i + 1), (i, i + 2), (i, i + 3), and (i, i + 4), starting at the relevant helical positions N1, N2, N3, C3, C2, C1, and N-int (interior positions), and also at the first nonhelical positions in both termini of the helices, namely, N-cap and C-cap. The statistical analysis of the propensity values has shown that pairing is significantly dependent on the type of the amino acids and on the position of the pair. A few sequences of three and four amino acids were selected and their high prevalence in helices is outlined in this work. The Glu-Lys-Tyr-Pro sequence shows a peculiar distribution in proteins, which may suggest a relevant structural role in alpha-helices when Pro is located at the C-cap position. A bioinformatics tool was developed, which updates automatically and periodically the results and makes them available in a web site.


Subject(s)
Amino Acids/chemistry , Proteins/chemistry , Hydrogen Bonding , Protein Folding , Protein Structure, Secondary
17.
Rev Port Pneumol ; 12(5): 503-24, 2006.
Article in English, Portuguese | MEDLINE | ID: mdl-17117322

ABSTRACT

CT-guided Percutaneous Transthoracic Biopsies (PTB) performed in the Radiology Department of Garcia de Orta Hospital between 2002 and 2004 to evaluate undetermined pulmonary lesions were retrospectively analysed. 89 fine needle aspiration biopsies (FNAB) and 13 core needle biopsies (CNB) were performed on 92 patients (67 men, mean age: 64.4 years). 82 lesions (89%) were nodular lesions (mean diameter: 3.8+/-1.7 cm, 65 peripheral). We did not observe complications among patients who underwent CNB; minor complications and pneumothorax requiring drainage occurred in 11 FNAB. 72 FNAB were considered adequate for cytology diagnosis; 72% of them positive for malignancy. All CNB were adequate and conclusive. From the 7 CNB performed on patients with previous FNAB, 3 allowed a better histological characterization and in 3 cases of inadequate FNAB, CNB was conclusive. All malignant lesions were nodules: 20 adenocarcinoma, 13 non-small cell lung cancer (SCLC), 10 epidermoid tumours, 5 small-cell lung cancer, 2 carcinoids, 1 bronchiolo alveolar carcinoma, 1 malignant mesothelioma and 8 metastasis. Unspecific/inflammatory lesions (n=5) were the most frequent benign lesions. Malignant lesions were more prevalent in older patients (p=0.007) and were larger (p=0.006). Spiculated and lobulated contour (p=0.05) were more prevalent in malignant lesions while regular contour was more frequent among benign lesions (p=0.0001). Gender, smoking, location, pleural tag, homogenous attenuation, cavitation, calcification, necrosis and air bronchogram did not differ significantly between benign and malignant nodules. This study shows that CT-guided PTB is a safe and effective procedure in the evaluation of undetermined pulmonary lesions.


Subject(s)
Biopsy, Needle/methods , Lung Diseases/diagnosis , Lung Neoplasms/diagnosis , Tomography, X-Ray Computed , Adult , Aged , Aged, 80 and over , Female , Humans , Male , Middle Aged , Retrospective Studies , Thorax
SELECTION OF CITATIONS
SEARCH DETAIL
...