Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 64
Filtrar
1.
Heliyon ; 10(11): e31368, 2024 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-38828293

RESUMO

Although college physical education (PE) is a compulsory course in college teaching, due to the openness of assessment standards and students studying to pass the final exam, college PE has been undervalued, so this paper aims to explore the new model of college PE teaching. In response, this paper took the air volleyball course as an example and redesigned the teaching of college PE based on the theory of flipped classroom and outcomes-based education (OBE). This paper also proposed a personalised learning system for college sports based on genetic algorithm (GA) and data structure, greatly improving college PE's autonomous learning ability and willingness. In the design of the teaching model, this paper compared and analysed the teaching model combining flipped classroom and OBE with flipped classroom teaching model, OBE teaching model and traditional teaching model. A one-semester investigation was conducted by selecting 40 students who took the air volleyball PE course at Hebei Normal University for Nationalities. The 40 students were divided into four groups, and their learning after one semester was compared. The experimental results showed that, compared with the traditional teaching model group, the outcome-based education and flipped classroom education group's performance of hitting the ball, passing the ball, spiking the ball, and serving the ball increased by 3.8 %, 14.3 %, 20.8 %, and 10.3 %, respectively. This suggested that the new college physical education teaching model based on flipped classrooms and OBE has a good teaching effect and can be used as a reference and help for others.

2.
Stat Med ; 2024 Jun 10.
Artigo em Inglês | MEDLINE | ID: mdl-38853711

RESUMO

Analysis of integrated data often requires record linkage in order to join together the data residing in separate sources. In case linkage errors cannot be avoided, due to the lack a unique identity key that can be used to link the records unequivocally, standard statistical techniques may produce misleading inference if the linked data are treated as if they were true observations. In this paper, we propose methods for categorical data analysis based on linked data that are not prepared by the analyst, such that neither the match-key variables nor the unlinked records are available. The adjustment is based on the proportion of false links in the linked file and our approach allows the probabilities of correct linkage to vary across the records without requiring that one is able to estimate this probability for each individual record. It accommodates also the general situation where unmatched records that cannot possibly be correctly linked exist in all the sources. The proposed methods are studied by simulation and applied to real data.

3.
Genome Biol ; 25(1): 106, 2024 04 25.
Artigo em Inglês | MEDLINE | ID: mdl-38664753

RESUMO

Centrifuger is an efficient taxonomic classification method that compares sequencing reads against a microbial genome database. In Centrifuger, the Burrows-Wheeler transformed genome sequences are losslessly compressed using a novel scheme called run-block compression. Run-block compression achieves sublinear space complexity and is effective at compressing diverse microbial databases like RefSeq while supporting fast rank queries. Combining this compression method with other strategies for compacting the Ferragina-Manzini (FM) index, Centrifuger reduces the memory footprint by half compared to other FM-index-based approaches. Furthermore, the lossless compression and the unconstrained match length help Centrifuger achieve greater accuracy than competing methods at lower taxonomic levels.


Assuntos
Compressão de Dados , Metagenômica , Compressão de Dados/métodos , Metagenômica/métodos , Software , Genoma Microbiano , Genoma Bacteriano , Análise de Sequência de DNA/métodos
4.
Genome Biol Evol ; 16(2)2024 02 01.
Artigo em Inglês | MEDLINE | ID: mdl-38252924

RESUMO

Comparative sequence analysis permits unraveling the molecular processes underlying gene evolution. Many statistical methods generate candidate positions within genes, such as fast or slowly evolving sites, coevolving groups of residues, sites undergoing positive selection, or changes in evolutionary rates. Understanding the functional causes of these evolutionary patterns requires combining the results of these analyses and mapping them onto molecular structures, a complex task involving distinct coordinate referential systems. To ease this task, we introduce the site/group extended data format, a simple text format to store (groups of) site annotations. We developed a toolset, the SgedTools, which permits site/group extended data file manipulation, creating them from various software outputs and translating coordinates between individual sequences, alignments, and three-dimensional structures. The package also includes a Monte-Carlo procedure to generate random site samples, possibly conditioning on site-specific features. This eases the statistical testing of evolutionary hypotheses, accounting for the structural properties of the encoded molecules.


Assuntos
Evolução Molecular , Software , Alinhamento de Sequência , Análise de Sequência
5.
Adv Mater ; 36(13): e2311040, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38145578

RESUMO

Graphs adequately represent the enormous interconnections among numerous entities in big data, incurring high computational costs in analyzing them with conventional hardware. Physical graph representation (PGR) is an approach that replicates the graph within a physical system, allowing for efficient analysis. This study introduces a cross-wired crossbar array (cwCBA), uniquely connecting diagonal and non-diagonal components in a CBA by a cross-wiring process. The cross-wired diagonal cells enable cwCBA to achieve precise PGR and dynamic node state control. For this purpose, a cwCBA is fabricated using Pt/Ta2O5/HfO2/TiN (PTHT) memristor with high on/off and self-rectifying characteristics. The structural and device benefits of PTHT cwCBA for enhanced PGR precision are highlighted, and the practical efficacy is demonstrated for two applications. First, it executes a dynamic path-finding algorithm, identifying the shortest paths in a dynamic graph. PTHT cwCBA shows a more accurate inferred distance and ≈1/3800 lower processing complexity than the conventional method. Second, it analyzes the protein-protein interaction (PPI) networks containing self-interacting proteins, which possess intricate characteristics compared to typical graphs. The PPI prediction results exhibit an average of 30.5% and 21.3% improvement in area under the curve and F1-score, respectively, compared to existing algorithms.

6.
bioRxiv ; 2023 Nov 17.
Artigo em Inglês | MEDLINE | ID: mdl-38014029

RESUMO

Centrifuger is an efficient taxonomic classification method that compares sequencing reads against a microbial genome database. In Centrifuger, the Burrows-Wheeler transformed genome sequences are losslessly compressed using a novel scheme called run-block compression. Run-block compression achieves sublinear space complexity and is effective at compressing diverse microbial databases like RefSeq while supporting fast rank queries. Combining this compression method with other strategies for compacting the Ferragina-Manzini (FM) index, Centrifuger reduces the memory footprint by half compared to other FM-index-based approaches. Furthermore, the lossless compression and the unconstrained match length help Centrifuger achieve greater accuracy than competing methods at lower taxonomic levels.

7.
Front Neuroinform ; 17: 1251023, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37841811

RESUMO

Neuroimaging research requires sophisticated tools for analyzing complex data, but efficiently leveraging these tools can be a major challenge, especially on large datasets. CBRAIN is a web-based platform designed to simplify the use and accessibility of neuroimaging research tools for large-scale, collaborative studies. In this paper, we describe how CBRAIN's unique features and infrastructure were leveraged to integrate TAPAS PhysIO, an open-source MATLAB toolbox for physiological noise modeling in fMRI data. This case study highlights three key elements of CBRAIN's infrastructure that enable streamlined, multimodal tool integration: a user-friendly GUI, a Brain Imaging Data Structure (BIDS) data-entry schema, and convenient in-browser visualization of results. By incorporating PhysIO into CBRAIN, we achieved significant improvements in the speed, ease of use, and scalability of physiological preprocessing. Researchers now have access to a uniform and intuitive interface for analyzing data, which facilitates remote and collaborative evaluation of results. With these improvements, CBRAIN aims to become an essential open-science tool for integrative neuroimaging research, supporting FAIR principles and enabling efficient workflows for complex analysis pipelines.

8.
iScience ; 26(8): 107402, 2023 Aug 18.
Artigo em Inglês | MEDLINE | ID: mdl-37575187

RESUMO

A Wheeler graph represents a collection of strings in a way that is particularly easy to index and query. Such a graph is a practical choice for representing a graph-shaped pangenome, and it is the foundation for current graph-based pangenome indexes. However, there are no practical tools to visualize or to check graphs that may have the Wheeler properties. Here, we present Wheelie, an algorithm that combines a renaming heuristic with a permutation solver (Wheelie-PR) or a Satisfiability Modulo Theory (SMT) solver (Wheelie-SMT) to check whether a given graph has the Wheeler properties, a problem that is NP-complete in general. Wheelie can check a variety of random and real-world graphs in far less time than any algorithm proposed to date. It can check a graph with 1,000s of nodes in seconds. We implement these algorithms together with complementary visualization tools in the WGT toolkit, available as open source software at https://github.com/Kuanhao-Chao/Wheeler_Graph_Toolkit.

9.
Entropy (Basel) ; 25(7)2023 Jul 13.
Artigo em Inglês | MEDLINE | ID: mdl-37510005

RESUMO

Machine learning has become increasingly popular in academic and industrial communities and has been widely implemented in various online applications due to its powerful ability to analyze and use data. Among all the machine learning models, decision tree models stand out due to their great interpretability and simplicity, and have been implemented in cloud computing services for various purposes. Despite its great success, the integrity issue of online decision tree prediction is a growing concern. The correctness and consistency of decision tree predictions in cloud computing systems need more security guarantees since verifying the correctness of the model prediction remains challenging. Meanwhile, blockchain has a promising prospect in two-party machine learning services as the immutable and traceable characteristics satisfy the verifiable settings in machine learning services. In this paper, we initiate the study of decision tree prediction services on blockchain systems and propose VDT, a Verifiable Decision Tree prediction scheme for decision tree prediction. Specifically, by leveraging the Merkle tree and hash function, the scheme allows the service provider to generate a verification proof to convince the client that the output of the decision tree prediction is correctly computed on a particular data sample. It is further extended to an update method for a verifiable decision tree to modify the decision tree model efficiently. We prove the security of the proposed VDT schemes and evaluate their performance using real datasets. Experimental evaluations show that our scheme requires less than one second to produce verifiable proof.

10.
Indian J Public Health ; 67(1): 72-77, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37039209

RESUMO

Background: Child mortality is a major public health issue. The studies on under-five mortality that ignore the hierarchical facts mislead the interpretation of the results due to observations in the same cluster sharing common cluster-level random effects. Objectives: The present study uses a multilevel model to analyze under-five mortality and identify the significant factors for under-five mortality in Manipur. Methods: National Family Health Survey-5 (2019-21) data are used in the present study. A multilevel mixed-effect Weibull parameter survival model was fitted to determine the factors affecting under-five mortality. We construct three-level data, individual levels are nested within primary sampling units (PSUs), and PSUs are nested within districts. Results: Out of the 3225 under-five children, 85 (2.64%) died. The three-level mixed-effects Weibull parametric survival model with PSUs nested within the districts, the likelihood-ratio test with Chi-square value = 10.98 and P = 0.004 < 0.05 indicated that the model with random-intercept effects model with PSUs nested within the districts fits the data better than the fixed effect model. The four covariates, namely the number of birth in the last 5 years, age of mother at first birth, use of contraceptive, and size of child at birth, were found as the risk factor for under-five mortality at a 5% level of significance. Conclusions: In the random-intercept effect model, the two estimated variances of the random-intercept effects for district and PSU levels are 0.27 and 0.31, respectively. The values indicate variations (unobserved heterogeneities) in the risk of death of the under-five children between districts and PSUs levels.


Assuntos
Mães , Saúde Pública , Recém-Nascido , Criança , Feminino , Humanos , Índia/epidemiologia , Análise de Sobrevida , Fatores de Risco
11.
Ann Biomed Res ; 5(1)2023.
Artigo em Inglês | MEDLINE | ID: mdl-38179070

RESUMO

Delivering radiation therapy based on erroneous or corrupted treatment plan data has previously and unfortunately resulted in severe, sometimes grave patient harm. Aiming to prevent such harm and improve safety in radiation therapy treatment, this work introduces a novel, yet intuitive algorithm for strategically structuring the complex and unstructured data typical of modern treatment plans so their treatment sites may automatically be verified with deep-learning architectures. The proposed algorithm utilizes geometric and dose plan parameters to represent each plan's data as a heat map to feed a deep-learning classifier that will predict the plan's treatment site. Once it is returned by the classifier, a plan's predicted site can be compared to its documented intended site, and a warning raised should the two differ. Using real head-neck, breast, and prostate treatment plan data retrieved at two hospitals in the United States, the algorithm is evaluated by observing the accuracy of convolutional neural networks (ConvNets) in correctly classifying the structured heat map data. Many well-known ConvNet architectures are tested, and ResNet-18 performs the best with a testing accuracy of 97.8% and 0.979 F-1 score. Clearly, the heat maps generated by the proposed algorithm, despite using only a few of the many available plan parameters, retain enough information for correct treatment site classification. The simple construction and ease of interpretation make the heat maps an attractive choice for classification and error detection.

12.
Physiol Meas ; 43(10)2022 10 31.
Artigo em Inglês | MEDLINE | ID: mdl-36195081

RESUMO

Objective.Due to the variability of human movements, muscle activations vary among trials and subjects. However, few studies investigated how data organization methods for addressing variability impact the extracted muscle synergies.Approach.Fifteen healthy subjects performed a large set of upper limb multi-directional point-to-point reaching movements. Then, the study extracted muscle synergies under different data settings and investigated how data structure prior to synergy extraction, namely concatenation, averaging, and single trial, the number of considered trials, and the number of reaching directions affected the number and components of muscle synergies.Main results.The results showed that the number and components of synergies were significantly affected by the data structure. The concatenation method identified the highest number of synergies, and the averaging method usually found a smaller number of synergies. When the concatenated trials or reaching directions was lower than a minimum value, the number of synergies increased with the increase of the number of trials or reaching directions; however, when the number of trials or reaching directions reached a threshold, the number of synergies was usually constant or with less variation even when novel directions and trials were added. Similarity analysis also showed a slight increase when the number of trials or reaching directions was lower than a threshold. This study recommends that at least five trials and four reaching directions and the concatenation method are considered in muscle synergies analysis during upper limb tasks.Significance.This study makes the researchers focus on the variability analysis induced by the diseases rather than the techniques applied for synergies analysis and promotes applications of muscle synergies in clinical scenarios.


Assuntos
Movimento , Músculo Esquelético , Humanos , Eletromiografia , Fenômenos Biomecânicos , Músculo Esquelético/fisiologia , Movimento/fisiologia , Extremidade Superior
13.
Algorithmica ; 84(11): 3223-3245, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36313790

RESUMO

We study a variant of the classical membership problem in automata theory, which consists of deciding whether a given input word is accepted by a given automaton. We do so through the lenses of parameterized dynamic data structures: we assume that the automaton is fixed and its size is the parameter, while the input word is revealed as in a stream, one symbol at a time following the natural order on positions. The goal is to design a dynamic data structure that can be efficiently updated upon revealing the next symbol, while maintaining the answer to the query on whether the word consisting of symbols revealed so far is accepted by the automaton. We provide complexity bounds for this dynamic acceptance problem for timed automata that process symbols interleaved with time spans. The main contribution is a dynamic data structure that maintains acceptance of a fixed one-clock timed automaton  A with amortized update time  2 O ( | A | ) per input symbol.

14.
Brief Bioinform ; 23(6)2022 11 19.
Artigo em Inglês | MEDLINE | ID: mdl-36151725

RESUMO

Accurately identifying cell-populations is paramount to the quality of downstream analyses and overall interpretations of single-cell RNA-seq (scRNA-seq) datasets but remains a challenge. The quality of single-cell clustering depends on the proximity metric used to generate cell-to-cell distances. Accordingly, proximity metrics have been benchmarked for scRNA-seq clustering, typically with results averaged across datasets to identify a highest performing metric. However, the 'best-performing' metric varies between studies, with the performance differing significantly between datasets. This suggests that the unique structural properties of an scRNA-seq dataset, specific to the biological system under study, have a substantial impact on proximity metric performance. Previous benchmarking studies have omitted to factor the structural properties into their evaluations. To address this gap, we developed a framework for the in-depth evaluation of the performance of 17 proximity metrics with respect to core structural properties of scRNA-seq data, including sparsity, dimensionality, cell-population distribution and rarity. We find that clustering performance can be improved substantially by the selection of an appropriate proximity metric and neighbourhood size for the structural properties of a dataset, in addition to performing suitable pre-processing and dimensionality reduction. Furthermore, popular metrics such as Euclidean and Manhattan distance performed poorly in comparison to several lessor applied metrics, suggesting that the default metric for many scRNA-seq methods should be re-evaluated. Our findings highlight the critical nature of tailoring scRNA-seq analyses pipelines to the dataset under study and provide practical guidance for researchers looking to optimize cell-similarity search for the structural properties of their own data.


Assuntos
Benchmarking , Análise de Célula Única , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , RNA-Seq , Análise por Conglomerados , Algoritmos
15.
Front Neurosci ; 16: 871228, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35516811

RESUMO

The Brain Imaging Data Structure (BIDS) is a specification for organizing, sharing, and archiving neuroimaging data and metadata in a reusable way. First developed for magnetic resonance imaging (MRI) datasets, the community-led specification evolved rapidly to include other modalities such as magnetoencephalography, positron emission tomography, and quantitative MRI (qMRI). In this work, we present an extension to BIDS for microscopy imaging data, along with example datasets. Microscopy-BIDS supports common imaging methods, including 2D/3D, ex/in vivo, micro-CT, and optical and electron microscopy. Microscopy-BIDS also includes comprehensible metadata definitions for hardware, image acquisition, and sample properties. This extension will facilitate future harmonization efforts in the context of multi-modal, multi-scale imaging such as the characterization of tissue microstructure with qMRI.

16.
Acta Neurochir Suppl ; 134: 207-214, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-34862544

RESUMO

Natural language processing (NLP) is the task of converting unstructured human language data into structured data that a machine can understand. While its applications are far and wide in healthcare, and are growing considerably every day, this chapter will focus on one particularly relevant application for healthcare professionals-reducing the burden of clinical documentation. More specifically, the chapter will discuss two studies (Gopinath et al., Fast, structured clinical documentation via contextual autocomplete. arXiv: 2007.15153, 2020; Greenbaum et al., Contextual autocomplete: a novel user interface using machine learning to improve ontology usage and structured data capture for presenting problems in the emergency department, 2017) that have implemented contextual autocompletion in electronic medical records and their promising results with regards to time saved for clinicians. The goals of this chapter are to introduce to the curious healthcare provider the basics of natural language processing, zoom into the use case of contextual autocomplete for electronic medical records, and provide a hands-on tutorial that introduces the basic NLP concepts required to build a model for predictive suggestions.


Assuntos
Aprendizado de Máquina , Processamento de Linguagem Natural , Registros Eletrônicos de Saúde , Humanos
17.
Imeta ; 1(4): e56, 2022 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-38867905

RESUMO

While phylogenetic trees and associated data have been getting easier to generate, it has been difficult to reuse, combine, and synthesize the information they provided, because published trees are often only available as image files and associated data are often stored in incompatible formats. To increase the reproducibility and reusability of phylogenetic data, the ggtree object was designed for storing phylogenetic tree and associated data, as well as visualization directives. The ggtree object itself is a graphic object and can be rendered as a static image. More importantly, the input tree and associated data that are used in visualization can be extracted from the graphic object, making it an ideal data structure for publishing tree (image, tree, and data in one single object) and thus enhancing data reuse and analytical reproducibility, as well as facilitating integrative and comparative studies. The ggtree package is freely available at https://www.bioconductor.org/packages/ggtree.

18.
BMC Bioinformatics ; 22(1): 607, 2021 Dec 20.
Artigo em Inglês | MEDLINE | ID: mdl-34930115

RESUMO

BACKGROUND: Biomolecular interactions that modulate biological processes occur mainly in cavities throughout the surface of biomolecular structures. In the data science era, structural biology has benefited from the increasing availability of biostructural data due to advances in structural determination and computational methods. In this scenario, data-intensive cavity analysis demands efficient scripting routines built on easily manipulated data structures. To fulfill this need, we developed pyKVFinder, a Python package to detect and characterize cavities in biomolecular structures for data science and automated pipelines. RESULTS: pyKVFinder efficiently detects cavities in biomolecular structures and computes their volume, area, depth and hydropathy, storing these cavity properties in NumPy arrays. Benefited from Python ecosystem interoperability and data structures, pyKVFinder can be integrated with third-party scientific packages and libraries for mathematical calculations, machine learning and 3D visualization in automated workflows. As proof of pyKVFinder's capabilities, we successfully identified and compared ADRP substrate-binding site of SARS-CoV-2 and a set of homologous proteins with pyKVFinder, showing its integrability with data science packages such as matplotlib, NGL Viewer, SciPy and Jupyter notebook. CONCLUSIONS: We introduce an efficient, highly versatile and easily integrable software for detecting and characterizing biomolecular cavities in data science applications and automated protocols. pyKVFinder facilitates biostructural data analysis with scripting routines in the Python ecosystem and can be building blocks for data science and drug design applications.


Assuntos
COVID-19 , Ciência de Dados , Análise de Dados , Ecossistema , Humanos , SARS-CoV-2
19.
Animal ; 15(12): 100411, 2021 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-34837779

RESUMO

Genotype-by-environment interaction is caused by variation in genetic environmental sensitivity (GES), which can be subdivided into macro- and micro-GES. Macro-GES is genetic sensitivity to macro-environments (definable environments often shared by groups of animals), while micro-GES is genetic sensitivity to micro-environments (individual environments). A combined reaction norm and double hierarchical generalised linear model (RN-DHGLM) allows for simultaneous estimation of base genetic, macro- and micro-GES effects. The accuracy of variance components estimated using a RN-DHGLM has been explicitly studied for balanced data and recommendation of a data size with a minimum of 100 sires with at least 100 offspring each have been made. In the current study, the data size (numbers of sires and progeny) and structure requirements of the RN-DHGLM were investigated for two types of unbalanced datasets. Both datasets had a variable number of offspring per sire, but one dataset also had a variable number of offspring within macro-environments. The accuracy and bias of the estimated macro- and micro-GES effects and the estimated breeding values (EBVs) obtained using the RN-DHGLM depended on the data size. Reasonably accurate and unbiased estimates were obtained with data containing 500 sires with 20 offspring or 100 sires with 50 offspring, regardless of the data structure. Variable progeny group sizes, alone or in combination with an unequal number of offspring within macro-environments, had little impact on the dispersion of the EBVs or the bias and accuracy of variance component estimation, but resulted in lower accuracies of the EBVs. Compared to genetic correlations of zero, a genetic correlation of 0.5 between base genetic, macro- and micro-GES components resulted in a slight decrease in the percentage of replicates that converged out of 100 replicates, but had no effect on the dispersion and accuracy of variance component estimation or the dispersion of the EBVs. The results show that it is possible to apply the RN-DHGLM to unbalanced datasets to obtain estimates of variance due to macro- and micro-GES. Furthermore, the levels of accuracy and bias of variance estimates when analysing macro- and micro-GES simultaneously are determined by average family size, with limited impact from variability in family size and/or cohort size. This creates opportunities for the use of field data from populations with unbalanced data structures when estimating macro- and micro-GES.


Assuntos
Modelos Genéticos , Animais , Genótipo , Modelos Lineares
20.
Neuron ; 109(22): 3594-3608.e2, 2021 11 17.
Artigo em Inglês | MEDLINE | ID: mdl-34592168

RESUMO

The large diversity of neuron types provides the means by which cortical circuits perform complex operations. Neuron can be described by biophysical and molecular characteristics, afferent inputs, and neuron targets. To quantify, visualize, and standardize those features, we developed the open-source, MATLAB-based framework CellExplorer. It consists of three components: a processing module, a flexible data structure, and a powerful graphical interface. The processing module calculates standardized physiological metrics, performs neuron-type classification, finds putative monosynaptic connections, and saves them to a standardized, yet flexible, machine-readable format. The graphical interface makes it possible to explore the computed features at the speed of a mouse click. The framework allows users to process, curate, and relate their data to a growing public collection of neurons. CellExplorer can link genetically identified cell types to physiological properties of neurons collected across laboratories and potentially lead to interlaboratory standards of single-cell metrics.


Assuntos
Neurônios , Neurônios/fisiologia
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...