Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 16 de 16
Filtrar
1.
IEEE Comput Graph Appl ; 43(2): 78-88, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37030833

RESUMO

We present a conceptual framework for the development of visual interactive techniques to formalize and externalize trust in machine learning (ML) workflows. Currently, trust in ML applications is an implicit process that takes place in the user's mind. As such, there is no method of feedback or communication of trust that can be acted upon. Our framework will be instrumental in developing interactive visualization approaches that will help users to efficiently and effectively build and communicate trust in ways that fit each of the ML process stages. We formulate several research questions and directions that include: 1) a typology/taxonomy of trust objects, trust issues, and possible reasons for (mis)trust; 2) formalisms to represent trust in machine-readable form; 3) means by which users can express their state of trust by interacting with a computer system (e.g., text, drawing, marking); 4) ways in which a system can facilitate users' expression and communication of the state of trust; and 5) creation of visual interactive techniques for representation and exploration of trust over all stages of an ML pipeline.

2.
Eur J Epidemiol ; 38(6): 689-697, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-37079135

RESUMO

In many populations, the peak period of incidence of type 1 diabetes (T1D) has been observed to be around 10-14 years of age, coinciding with puberty, but direct evidence of the role of puberty in the development of T1D is limited. We therefore aimed to investigate whether puberty and the timing of its onset are associated with the development of islet autoimmunity (IA) and subsequent progression to T1D. A Finnish population-based cohort of children with HLA-DQB1-conferred susceptibility to T1D was followed from 7 years of age until 15 years of age or until a diagnosis of T1D (n = 6920). T1D-associated autoantibodies and growth were measured at 3- to 12-month intervals, and pubertal onset timing was assessed based on growth. The analyses used a three-state survival model. IA was defined as being either positive for islet cell antibodies plus at least one biochemical autoantibody (ICA + 1) or as being repeatedly positive for at least one biochemical autoantibody (BC1). Depending on the IA definition, either 303 (4.4%, ICA + 1) or 435 (6.3%, BC1) children tested positive for IA by the age of 7 years, and 211 (3.2%, ICA + 1)) or 198 (5.3%, BC1) developed IA during follow-up. A total of 172 (2.5%) individuals developed T1D during follow-up, of whom 169 were positive for IA prior to the clinical diagnosis. Puberty was associated with an increase in the risk of progression to T1D, but only from ICA + 1-defined IA (hazard ratio 1.57; 95% confidence interval 1.14, 2.16), and the timing of pubertal onset did not affect the association. No association between puberty and the risk of IA was detected. In conclusion, puberty may affect the risk of progression but is not a risk factor for IA.


Assuntos
Diabetes Mellitus Tipo 1 , Ilhotas Pancreáticas , Criança , Humanos , Adolescente , Diabetes Mellitus Tipo 1/epidemiologia , Autoimunidade , Progressão da Doença , Autoanticorpos , Puberdade
3.
BMC Med Inform Decis Mak ; 22(1): 134, 2022 05 17.
Artigo em Inglês | MEDLINE | ID: mdl-35581648

RESUMO

BACKGROUND AND OBJECTIVE: Emergency Department (ED) overcrowding is a chronic international issue that is associated with adverse treatment outcomes. Accurate forecasts of future service demand would enable intelligent resource allocation that could alleviate the problem. There has been continued academic interest in ED forecasting but the number of used explanatory variables has been low, limited mainly to calendar and weather variables. In this study we investigate whether predictive accuracy of next day arrivals could be enhanced using high number of potentially relevant explanatory variables and document two feature selection processes that aim to identify which subset of variables is associated with number of next day arrivals. Performance of such predictions over longer horizons is also shown. METHODS: We extracted numbers of total daily arrivals from Tampere University Hospital ED between the time period of June 1, 2015 and June 19, 2019. 158 potential explanatory variables were collected from multiple data sources consisting not only of weather and calendar variables but also an extensive list of local public events, numbers of website visits to two hospital domains, numbers of available hospital beds in 33 local hospitals or health centres and Google trends searches for the ED. We used two feature selection processes: Simulated Annealing (SA) and Floating Search (FS) with Recursive Least Squares (RLS) and Least Mean Squares (LMS). Performance of these approaches was compared against autoregressive integrated moving average (ARIMA), regression with ARIMA errors (ARIMAX) and Random Forest (RF). Mean Absolute Percentage Error (MAPE) was used as the main error metric. RESULTS: Calendar variables, load of secondary care facilities and local public events were dominant in the identified predictive features. RLS-SA and RLS-FA provided slightly better accuracy compared ARIMA. ARIMAX was the most accurate model but the difference between RLS-SA and RLS-FA was not statistically significant. CONCLUSIONS: Our study provides new insight into potential underlying factors associated with number of next day presentations. It also suggests that predictive accuracy of next day arrivals can be increased using high-dimensional feature selection approach when compared to both univariate and nonfiltered high-dimensional approach. Performance over multiple horizons was similar with a gradual decline for longer horizons. However, outperforming ARIMAX remains a challenge when working with daily data. Future work should focus on enhancing the feature selection mechanism, investigating its applicability to other domains and in identifying other potentially relevant explanatory variables.


Assuntos
Serviço Hospitalar de Emergência , Armazenamento e Recuperação da Informação , Previsões , Humanos , Alocação de Recursos , Tempo
4.
IEEE Trans Vis Comput Graph ; 28(6): 2376-2387, 2022 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-35157586

RESUMO

Cartograms are popular for visualizing numerical data for administrative regions in thematic maps. When there are multiple data values per region (over time or from different datasets) shown as animated or juxtaposed cartograms, preserving the viewer's mental map in terms of stability between multiple cartograms is another important criterion alongside traditional cartogram criteria such as maintaining adjacencies. We present a method to compute stable stable Demers cartograms, where each region is shown as a square scaled proportionally to the given numerical data and similar data yield similar cartograms. We enforce orthogonal separation constraints using linear programming, and measure quality in terms of keeping adjacent regions close (cartogram quality) and using similar positions for a region between the different data values (stability). Our method guarantees the ability to connect most lost adjacencies with minimal-length planar orthogonal polylines. Experiments show that our method yields good quality and stability on multiple quality criteria.

5.
IEEE Trans Vis Comput Graph ; 28(1): 313-323, 2022 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-34587038

RESUMO

Edge bundling techniques cluster edges with similar attributes (i.e. similarity in direction and proximity) together to reduce the visual clutter. All edge bundling techniques to date implicitly or explicitly cluster groups of individual edges, or parts of them, together based on these attributes. These clusters can result in ambiguous connections that do not exist in the data. Confluent drawings of networks do not have these ambiguities, but require the layout to be computed as part of the bundling process. We devise a new bundling method, Edge-Path bundling, to simplify edge clutter while greatly reducing ambiguities compared to previous bundling techniques. Edge-Path bundling takes a layout as input and clusters each edge along a weighted, shortest path to limit its deviation from a straight line. Edge-Path bundling does not incur independent edge ambiguities typically seen in all edge bundling methods, and the level of bundling can be tuned through shortest path distances, Euclidean distances, and combinations of the two. Also, directed edge bundling naturally emerges from the model. Through metric evaluations, we demonstrate the advantages of Edge-Path bundling over other techniques.

6.
IEEE Comput Graph Appl ; 41(6): 7-12, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34890313

RESUMO

The increasing use of artificial intelligence (AI) technologies across application domains has prompted our society to pay closer attention to AI's trustworthiness, fairness, interpretability, and accountability. In order to foster trust in AI, it is important to consider the potential of interactive visualization, and how such visualizations help build trust in AI systems. This manifesto discusses the relevance of interactive visualizations and makes the following four claims: i) trust is not a technical problem, ii) trust is dynamic, iii) visualization cannot address all aspects of trust, and iv) visualization is crucial for human agency in AI.


Assuntos
Inteligência Artificial , Confiança , Humanos , Responsabilidade Social
7.
PLoS One ; 16(11): e0260137, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34793547

RESUMO

OBJECTIVE: Growth-based determination of pubertal onset timing would be cheap and practical. We aimed to determine this timing based on pubertal growth markers. Secondary aims were to estimate the differences in growth between cohorts and identify the role of overweight in onset timing. DESIGN: This multicohort study includes data from three Finnish cohorts-the Type 1 Diabetes Prediction and Prevention (DIPP, N = 2,825) Study, the Special Turku Coronary Risk Factor Intervention Project (STRIP, N = 711), and the Boy cohort (N = 66). Children were monitored for growth and Tanner staging (except in DIPP). METHODS: The growth data were analyzed using a Super-Imposition by Translation And Rotation growth curve model, and pubertal onset analyses were run using a time-to-pubertal onset model. RESULTS: The time-to-pubertal onset model used age at peak height velocity (aPHV), peak height velocity (PHV), and overweight status as covariates, with interaction between aPHV and overweight status for girls, and succeeded in determining the onset timing. Cross-validation showed a good agreement (71.0% for girls, 77.0% for boys) between the observed and predicted onset timings. Children in STRIP were taller overall (girls: 1.7 [95% CI: 0.9, 2.5] cm, boys: 1.0 [0.3, 2.2] cm) and had higher PHV values (girls: 0.13 [0.02, 0.25] cm/year, boys: 0.35 [0.21, 0.49] cm/year) than those in DIPP. Boys in the Boy cohort were taller (2.3 [0.3, 4.2] cm) compared with DIPP. Overweight girls showed pubertal onset at 1.0 [0.7, 1.4] year earlier compared with other girls. In boys, there was no such difference. CONCLUSIONS: The novel modeling approach provides an opportunity to evaluate the Tanner breast/genital stage-based pubertal onset timing in cohort studies including longitudinal data on growth but lacking pubertal follow-up.


Assuntos
Previsões/métodos , Puberdade/metabolismo , Puberdade/fisiologia , Adolescente , Idade de Início , Fenômenos Biológicos , Estatura , Mama/crescimento & desenvolvimento , Criança , Estudos de Coortes , Feminino , Finlândia , Genitália/crescimento & desenvolvimento , Crescimento/fisiologia , Humanos , Masculino , Homens , Modelos Teóricos , Sobrepeso , Fatores de Risco , Mulheres
8.
Nat Commun ; 12(1): 2532, 2021 05 05.
Artigo em Inglês | MEDLINE | ID: mdl-33953203

RESUMO

Biological processes are inherently continuous, and the chance of phenotypic discovery is significantly restricted by discretising them. Using multi-parametric active regression we introduce the Regression Plane (RP), a user-friendly discovery tool enabling class-free phenotypic supervised machine learning, to describe and explore biological data in a continuous manner. First, we compare traditional classification with regression in a simulated experimental setup. Second, we use our framework to identify genes involved in regulating triglyceride levels in human cells. Subsequently, we analyse a time-lapse dataset on mitosis to demonstrate that the proposed methodology is capable of modelling complex processes at infinite resolution. Finally, we show that hemocyte differentiation in Drosophila melanogaster has continuous characteristics.


Assuntos
Fenômenos Biológicos , Fenômenos Fisiológicos Celulares , Aprendizado de Máquina , Animais , Carcinoma Hepatocelular , Ciclo Celular , Diferenciação Celular , Linhagem Celular Tumoral , Drosophila melanogaster , Humanos , Proteínas de Membrana , Aprendizado de Máquina Supervisionado
9.
Br J Nutr ; 124(2): 173-180, 2020 Jul 28.
Artigo em Inglês | MEDLINE | ID: mdl-32102698

RESUMO

Several prospective studies have shown an association between cows' milk consumption and the risk of islet autoimmunity and/or type 1 diabetes. We wanted to study whether processing of milk plays a role. A population-based birth cohort of 6081 children with HLA-DQB1-conferred risk to type 1 diabetes was followed until the age of 15 years. We included 5545 children in the analyses. Food records were completed at the ages of 3 and 6 months and 1, 2, 3, 4 and 6 years, and diabetes-associated autoantibodies were measured at 3-12-month intervals. For milk products in the food composition database, we used conventional and processing-based classifications. We analysed the data using a joint model for longitudinal and time-to-event data. By the age of 6 years, islet autoimmunity developed in 246 children. Consumption of all cows' milk products together (energy-adjusted hazard ratio 1·06; 95 % CI 1·02, 1·11; P = 0·003), non-fermented milk products (1·06; 95 % CI 1·01, 1·10; P = 0·011) and fermented milk products (1·35; 95 % CI 1·10, 1·67; P = 0·005) was associated with an increased risk of islet autoimmunity. The early milk consumption was not associated with the risk beyond 6 years. We observed no clear differences based on milk homogenisation and heat treatment. Our results are consistent with the previous studies, which indicate that high milk consumption may cause islet autoimmunity in children at increased genetic risk. The study did not identify any specific type of milk processing that would clearly stand out as a sole risk factor apart from other milk products.

10.
Sci Rep ; 9(1): 7760, 2019 05 23.
Artigo em Inglês | MEDLINE | ID: mdl-31123290

RESUMO

Several dietary factors have been suspected to play a role in the development of advanced islet autoimmunity (IA) and/or type 1 diabetes (T1D), but the evidence is fragmentary. A prospective population-based cohort of 6081 Finnish newborn infants with HLA-DQB1-conferred susceptibility to T1D was followed up to 15 years of age. Diabetes-associated autoantibodies and diet were assessed at 3- to 12-month intervals. We aimed to study the association between consumption of selected foods and the development of advanced IA longitudinally with Cox regression models (CRM), basic joint models (JM) and joint latent class mixed models (JLCMM). The associations of these foods to T1D risk were also studied to investigate consistency between alternative endpoints. The JM showed a marginal association between meat consumption and advanced IA: the hazard ratio adjusted for selected confounding factors was 1.06 (95% CI: 1.00, 1.12). The JLCMM identified two classes in the consumption trajectories of fish and a marginal protective association for high consumers compared to low consumers: the adjusted hazard ratio was 0.68 (0.44, 1.05). Similar findings were obtained for T1D risk with adjusted hazard ratios of 1.13 (1.02, 1.24) for meat and 0.45 (0.23, 0.86) for fish consumption. Estimates from the CRMs were closer to unity and CIs were narrower compared to the JMs. Findings indicate that intake of meat might be directly and fish inversely associated with the development of advanced IA and T1D, and that disease hazards in longitudinal nutritional epidemiology are more appropriately modeled by joint models than with naive approaches.


Assuntos
Autoimunidade/imunologia , Hipersensibilidade Alimentar/etiologia , Ilhotas Pancreáticas/imunologia , Adolescente , Animais , Autoanticorpos/imunologia , Criança , Pré-Escolar , Estudos de Coortes , Diabetes Mellitus Tipo 1/etiologia , Diabetes Mellitus Tipo 1/imunologia , Dieta/métodos , Dietoterapia/métodos , Ovos , Feminino , Finlândia , Peixes , Predisposição Genética para Doença , Cadeias beta de HLA-DQ/metabolismo , Humanos , Lactente , Recém-Nascido , Masculino , Carne , Modelos Estatísticos , Modelos de Riscos Proporcionais , Estudos Prospectivos , Fatores de Risco
11.
IEEE Trans Vis Comput Graph ; 23(1): 241-250, 2017 01.
Artigo em Inglês | MEDLINE | ID: mdl-27875141

RESUMO

Dimensionality Reduction (DR) is a core building block in visualizing multidimensional data. For DR techniques to be useful in exploratory data analysis, they need to be adapted to human needs and domain-specific problems, ideally, interactively, and on-the-fly. Many visual analytics systems have already demonstrated the benefits of tightly integrating DR with interactive visualizations. Nevertheless, a general, structured understanding of this integration is missing. To address this, we systematically studied the visual analytics and visualization literature to investigate how analysts interact with automatic DR techniques. The results reveal seven common interaction scenarios that are amenable to interactive control such as specifying algorithmic constraints, selecting relevant features, or choosing among several DR algorithms. We investigate specific implementations of visual analysis systems integrating DR, and analyze ways that other machine learning methods have been combined with DR. Summarizing the results in a "human in the loop" process model provides a general lens for the evaluation of visual interactive DR systems. We apply the proposed model to study and classify several systems previously described in the literature, and to derive future research opportunities.

12.
Proc Natl Acad Sci U S A ; 112(42): 13115-20, 2015 Oct 20.
Artigo em Inglês | MEDLINE | ID: mdl-26438844

RESUMO

Genes with similar transcriptional activation kinetics can display very different temporal mRNA profiles because of differences in transcription time, degradation rate, and RNA-processing kinetics. Recent studies have shown that a splicing-associated RNA production delay can be significant. To investigate this issue more generally, it is useful to develop methods applicable to genome-wide datasets. We introduce a joint model of transcriptional activation and mRNA accumulation that can be used for inference of transcription rate, RNA production delay, and degradation rate given data from high-throughput sequencing time course experiments. We combine a mechanistic differential equation model with a nonparametric statistical modeling approach allowing us to capture a broad range of activation kinetics, and we use Bayesian parameter estimation to quantify the uncertainty in estimates of the kinetic parameters. We apply the model to data from estrogen receptor α activation in the MCF-7 breast cancer cell line. We use RNA polymerase II ChIP-Seq time course data to characterize transcriptional activation and mRNA-Seq time course data to quantify mature transcripts. We find that 11% of genes with a good signal in the data display a delay of more than 20 min between completing transcription and mature mRNA production. The genes displaying these long delays are significantly more likely to be short. We also find a statistical association between high delay and late intron retention in pre-mRNA data, indicating significant splicing-associated production delays in many genes.


Assuntos
Genoma Humano , Modelos Genéticos , RNA/biossíntese , Transcrição Gênica , Receptor alfa de Estrogênio/metabolismo , Humanos , Cinética , Células MCF-7 , RNA/genética , Transdução de Sinais
13.
PLoS One ; 9(11): e113053, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25427176

RESUMO

A main challenge of data-driven sciences is how to make maximal use of the progressively expanding databases of experimental datasets in order to keep research cumulative. We introduce the idea of a modeling-based dataset retrieval engine designed for relating a researcher's experimental dataset to earlier work in the field. The search is (i) data-driven to enable new findings, going beyond the state of the art of keyword searches in annotations, (ii) modeling-driven, to include both biological knowledge and insights learned from data, and (iii) scalable, as it is accomplished without building one unified grand model of all data. Assuming each dataset has been modeled beforehand, by the researchers or automatically by database managers, we apply a rapidly computable and optimizable combination model to decompose a new dataset into contributions from earlier relevant models. By using the data-driven decomposition, we identify a network of interrelated datasets from a large annotated human gene expression atlas. While tissue type and disease were major driving forces for determining relevant datasets, the found relationships were richer, and the model-based search was more accurate than the keyword search; moreover, it recovered biologically meaningful relationships that are not straightforwardly visible from annotations-for instance, between cells in different developmental stages such as thymocytes and T-cells. Data-driven links and citations matched to a large extent; the data-driven links even uncovered corrections to the publication data, as two of the most linked datasets were not highly cited and turned out to have wrong publication entries in the database.


Assuntos
Biologia Computacional/estatística & dados numéricos , Bases de Dados Genéticas/estatística & dados numéricos , Genoma Humano , Armazenamento e Recuperação da Informação/estatística & dados numéricos , Atlas como Assunto , Biologia Computacional/métodos , Conjuntos de Dados como Assunto , Expressão Gênica , Humanos , Armazenamento e Recuperação da Informação/métodos
14.
BMC Bioinformatics ; 8 Suppl 2: S11, 2007 May 03.
Artigo em Inglês | MEDLINE | ID: mdl-17493249

RESUMO

BACKGROUND: Human endogenous retroviruses (HERVs) are surviving traces of ancient retrovirus infections and now reside within the human DNA. Recently HERV expression has been detected in both normal tissues and diseased patients. However, the activities (expression levels) of individual HERV sequences are mostly unknown. RESULTS: We introduce a generative mixture model, based on Hidden Markov Models, for estimating the activities of the individual HERV sequences from EST (expressed sequence tag) databases. We use the model to estimate the relative activities of 181 HERVs. We also empirically justify a faster heuristic method for HERV activity estimation and use it to estimate the activities of 2450 HERVs. The majority of the HERV activities were previously unknown. CONCLUSION: (i) Our methods estimate activity accurately based on experiments on simulated data. (ii) Our estimate on real data shows that 7% of the HERVs are active. The active ones are spread unevenly into HERV groups and relatively uniformly in terms of estimated age. HERVs with the retroviral env gene are more often active than HERVs without env. Few of the active HERVs have open reading frames for retroviral proteins.


Assuntos
Algoritmos , Mapeamento Cromossômico/métodos , Bases de Dados Genéticas , Evolução Molecular , Etiquetas de Sequências Expressas , Genoma Viral/genética , Retroviridae/genética , Ativação Viral/genética , Humanos , Cadeias de Markov , Retroviridae/classificação , Especificidade da Espécie
15.
IEEE Trans Neural Netw ; 16(1): 68-83, 2005 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-15732390

RESUMO

A simple probabilistic model is introduced to generalize classical linear discriminant analysis (LDA) in finding components that are informative of or relevant for data classes. The components maximize the predictability of the class distribution which is asymptotically equivalent to 1) maximizing mutual information with the classes, and 2) finding principal components in the so-called learning or Fisher metrics. The Fisher metric measures only distances that are relevant to the classes, that is, distances that cause changes in the class distribution. The components have applications in data exploration, visualization, and dimensionality reduction. In empirical experiments, the method outperformed, in addition to more classical methods, a Renyi entropy-based alternative while having essentially equivalent computational cost.


Assuntos
Algoritmos , Inteligência Artificial , Metodologias Computacionais , Análise Discriminante , Armazenamento e Recuperação da Informação/métodos , Modelos Estatísticos , Análise por Conglomerados , Bases de Dados Factuais , Redes Neurais de Computação , Reconhecimento Automatizado de Padrão/métodos , Ética Baseada em Princípios
16.
Neural Netw ; 17(8-9): 1087-100, 2004.
Artigo em Inglês | MEDLINE | ID: mdl-15555853

RESUMO

We have earlier introduced a principle for learning metrics, which shows how metric-based methods can be made to focus on discriminative properties of data. The main applications are in supervising unsupervised learning to model interesting variation in data, instead of modeling all variation as plain unsupervised learning does. The metrics are derived by approximations to an information-geometric formulation. In this paper, we review the theory, introduce better approximations to the distances, and show how to apply them in two different kinds of unsupervised methods: prototype-based and pairwise distance-based. The two examples are self-organizing maps and multidimensional scaling (Sammon's mapping).


Assuntos
Inteligência Artificial , Redes Neurais de Computação , Artefatos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...