Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 148
Filtrar
1.
Genome Res ; 2024 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-38951026

RESUMO

mRNA-based vaccines and therapeutics are gaining popularity and usage across a wide range of conditions. One of the critical issues when designing such mRNAs is sequence optimization. Even small proteins or peptides can be encoded by an enormously large number of mRNAs. The actual mRNA sequence can have a large impact on several properties including expression, stability, immunogenicity, and more. To enable the selection of an optimal sequence, we developed CodonBERT, a large language model (LLM) for mRNAs. Unlike prior models, CodonBERT uses codons as inputs which enables it to learn better representations. CodonBERT was trained using more than 10 million mRNA sequences from a diverse set of organisms. The resulting model captures important biological concepts. CodonBERT can also be extended to perform prediction tasks for various mRNA properties. CodonBERT outperforms previous mRNA prediction methods including on a new flu vaccine dataset.

2.
Bioinformatics ; 40(Supplement_1): i151-i159, 2024 Jun 28.
Artigo em Inglês | MEDLINE | ID: mdl-38940139

RESUMO

MOTIVATION: Analysis of time series transcriptomics data from clinical trials is challenging. Such studies usually profile very few time points from several individuals with varying response patterns and dynamics. Current methods for these datasets are mainly based on linear, global orderings using visit times which do not account for the varying response rates and subgroups within a patient cohort. RESULTS: We developed a new method that utilizes multi-commodity flow algorithms for trajectory inference in large scale clinical studies. Recovered trajectories satisfy individual-based timing restrictions while integrating data from multiple patients. Testing the method on multiple drug datasets demonstrated an improved performance compared to prior approaches suggested for this task, while identifying novel disease subtypes that correspond to heterogeneous patient response patterns. AVAILABILITY AND IMPLEMENTATION: The source code and instructions to download the data have been deposited on GitHub at https://github.com/euxhenh/Truffle.


Assuntos
Algoritmos , Transcriptoma , Humanos , Transcriptoma/genética , Perfilação da Expressão Gênica/métodos , Software
3.
Bioinformatics ; 40(7)2024 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-38810107

RESUMO

MOTIVATION: Lipid nanoparticles (LNPs) are the most widely used vehicles for mRNA vaccine delivery. The structure of the lipids composing the LNPs can have a major impact on the effectiveness of the mRNA payload. Several properties should be optimized to improve delivery and expression including biodegradability, synthetic accessibility, and transfection efficiency. RESULTS: To optimize LNPs, we developed and tested models that enable the virtual screening of LNPs with high transfection efficiency. Our best method uses the lipid Simplified Molecular-Input Line-Entry System (SMILES) as inputs to a large language model. Large language model-generated embeddings are then used by a downstream gradient-boosting classifier. As we show, our method can more accurately predict lipid properties, which could lead to higher efficiency and reduced experimental time and costs. AVAILABILITY AND IMPLEMENTATION: Code and data links available at: https://github.com/Sanofi-Public/LipoBART.


Assuntos
Lipídeos , Nanopartículas , Transfecção , Nanopartículas/química , Lipídeos/química , Transfecção/métodos , RNA Mensageiro/metabolismo , Lipossomos
4.
bioRxiv ; 2024 Jan 07.
Artigo em Inglês | MEDLINE | ID: mdl-38260359

RESUMO

Direct nanopore-based RNA sequencing can be used to detect post-transcriptional base modifications, such as m6A methylation, based on the electric current signals produced by the distinct chemical structures of modified bases. A key challenge is the scarcity of adequate training data with known methylation modifications. We present Xron, a hybrid encoder-decoder framework that delivers a direct methylation-distinguishing basecaller by training on synthetic RNA data and immunoprecipitation-based experimental data in two steps. First, we generate data with more diverse modification combinations through in silico cross-linking. Second, we use this dataset to train an end-to-end neural network basecaller followed by fine-tuning on immunoprecipitation-based experimental data with label-smoothing. The trained neural network basecaller outperforms existing methylation detection methods on both read-level and site-level prediction scores. Xron is a standalone, end-to-end m6A-distinguishing basecaller capable of detecting methylated bases directly from raw sequencing signals, enabling de novo methylome assembly.

5.
Nature ; 626(7998): 367-376, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38092041

RESUMO

Implantation of the human embryo begins a critical developmental stage that comprises profound events including axis formation, gastrulation and the emergence of haematopoietic system1,2. Our mechanistic knowledge of this window of human life remains limited due to restricted access to in vivo samples for both technical and ethical reasons3-5. Stem cell models of human embryo have emerged to help unlock the mysteries of this stage6-16. Here we present a genetically inducible stem cell-derived embryoid model of early post-implantation human embryogenesis that captures the reciprocal codevelopment of embryonic tissue and the extra-embryonic endoderm and mesoderm niche with early haematopoiesis. This model is produced from induced pluripotent stem cells and shows unanticipated self-organizing cellular programmes similar to those that occur in embryogenesis, including the formation of amniotic cavity and bilaminar disc morphologies as well as the generation of an anterior hypoblast pole and posterior domain. The extra-embryonic layer in these embryoids lacks trophoblast and shows advanced multilineage yolk sac tissue-like morphogenesis that harbours a process similar to distinct waves of haematopoiesis, including the emergence of erythroid-, megakaryocyte-, myeloid- and lymphoid-like cells. This model presents an easy-to-use, high-throughput, reproducible and scalable platform to probe multifaceted aspects of human development and blood formation at the early post-implantation stage. It will provide a tractable human-based model for drug testing and disease modelling.


Assuntos
Desenvolvimento Embrionário , Camadas Germinativas , Hematopoese , Saco Vitelino , Humanos , Implantação do Embrião , Endoderma/citologia , Endoderma/embriologia , Camadas Germinativas/citologia , Camadas Germinativas/embriologia , Saco Vitelino/citologia , Saco Vitelino/embriologia , Mesoderma/citologia , Mesoderma/embriologia , Células-Tronco Pluripotentes Induzidas/citologia , Âmnio/citologia , Âmnio/embriologia , Corpos Embrioides/citologia , Linhagem da Célula , Biologia do Desenvolvimento/métodos , Biologia do Desenvolvimento/tendências
6.
bioRxiv ; 2023 Jun 16.
Artigo em Inglês | MEDLINE | ID: mdl-37398213

RESUMO

Spatial transcriptomics promises to greatly improve our understanding of tissue organization and cell-cell interactions. While most current platforms for spatial transcriptomics only offer multi-cellular resolution, with 10-15 cells per spot, recent technologies provide a much denser spot placement leading to sub-cellular resolution. A key challenge for these newer methods is cell segmentation and the assignment of spots to cells. Traditional image-based segmentation methods are limited and do not make full use of the information profiled by spatial transcrip-tomics. Here we present SCS, which combines imaging data with sequencing data to improve cell segmentation accuracy. SCS assigns spots to cells by adaptively learning the position of each spot relative to the center of its cell using a transformer neural network. SCS was tested on two new sub-cellular spatial transcriptomics technologies and outperformed traditional image-based segmentation methods. SCS achieved better accuracy, identified more cells, and provided more realistic cell size estimation. Sub-cellular analysis of RNAs using SCS spots assignments provides information on RNA localization and further supports the segmentation results.

7.
bioRxiv ; 2023 Jul 24.
Artigo em Inglês | MEDLINE | ID: mdl-37398391

RESUMO

Implantation of the human embryo commences a critical developmental stage that comprises profound morphogenetic alteration of embryonic and extra-embryonic tissues, axis formation, and gastrulation events. Our mechanistic knowledge of this window of human life remains limited due to restricted access to in vivo samples for both technical and ethical reasons. Additionally, human stem cell models of early post-implantation development with both embryonic and extra-embryonic tissue morphogenesis are lacking. Here, we present iDiscoid, produced from human induced pluripotent stem cells via an engineered a synthetic gene circuit. iDiscoids exhibit reciprocal co-development of human embryonic tissue and engineered extra-embryonic niche in a model of human post-implantation. They exhibit unanticipated self-organization and tissue boundary formation that recapitulates yolk sac-like tissue specification with extra-embryonic mesoderm and hematopoietic characteristics, the formation of bilaminar disc-like embryonic morphology, the development of an amniotic-like cavity, and acquisition of an anterior-like hypoblast pole and posterior-like axis. iDiscoids offer an easy-to-use, high-throughput, reproducible, and scalable platform to probe multifaceted aspects of human early post-implantation development. Thus, they have the potential to provide a tractable human model for drug testing, developmental toxicology, and disease modeling.

8.
Nat Methods ; 20(8): 1237-1243, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37429992

RESUMO

Spatial transcriptomics promises to greatly improve our understanding of tissue organization and cell-cell interactions. While most current platforms for spatial transcriptomics only offer multi-cellular resolution, with 10-15 cells per spot, recent technologies provide a much denser spot placement leading to subcellular resolution. A key challenge for these newer methods is cell segmentation and the assignment of spots to cells. Traditional image-based segmentation methods are limited and do not make full use of the information profiled by spatial transcriptomics. Here we present subcellular spatial transcriptomics cell segmentation (SCS), which combines imaging data with sequencing data to improve cell segmentation accuracy. SCS assigns spots to cells by adaptively learning the position of each spot relative to the center of its cell using a transformer neural network. SCS was tested on two new subcellular spatial transcriptomics technologies and outperformed traditional image-based segmentation methods. SCS achieved better accuracy, identified more cells and provided more realistic cell size estimation. Subcellular analysis of RNAs using SCS spot assignments provides information on RNA localization and further supports the segmentation results.


Assuntos
Perfilação da Expressão Gênica , Transcriptoma , Comunicação Celular , Tamanho Celular , Aprendizagem
9.
Nat Aging ; 3(7): 776-790, 2023 07.
Artigo em Inglês | MEDLINE | ID: mdl-37400722

RESUMO

Cellular senescence is a well-established driver of aging and age-related diseases. There are many challenges to mapping senescent cells in tissues such as the absence of specific markers and their relatively low abundance and vast heterogeneity. Single-cell technologies have allowed unprecedented characterization of senescence; however, many methodologies fail to provide spatial insights. The spatial component is essential, as senescent cells communicate with neighboring cells, impacting their function and the composition of extracellular space. The Cellular Senescence Network (SenNet), a National Institutes of Health (NIH) Common Fund initiative, aims to map senescent cells across the lifespan of humans and mice. Here, we provide a comprehensive review of the existing and emerging methodologies for spatial imaging and their application toward mapping senescent cells. Moreover, we discuss the limitations and challenges inherent to each technology. We argue that the development of spatially resolved methods is essential toward the goal of attaining an atlas of senescent cells.


Assuntos
Envelhecimento , Senescência Celular , Estados Unidos , Humanos , Animais , Camundongos , Longevidade
10.
Bioinformatics ; 39(39 Suppl 1): i140-i148, 2023 06 30.
Artigo em Inglês | MEDLINE | ID: mdl-37387167

RESUMO

MOTIVATION: Spatial proteomics data have been used to map cell states and improve our understanding of tissue organization. More recently, these methods have been extended to study the impact of such organization on disease progression and patient survival. However, to date, the majority of supervised learning methods utilizing these data types did not take full advantage of the spatial information, impacting their performance and utilization. RESULTS: Taking inspiration from ecology and epidemiology, we developed novel spatial feature extraction methods for use with spatial proteomics data. We used these features to learn prediction models for cancer patient survival. As we show, using the spatial features led to consistent improvement over prior methods that used the spatial proteomics data for the same task. In addition, feature importance analysis revealed new insights about the cell interactions that contribute to patient survival. AVAILABILITY AND IMPLEMENTATION: The code for this work can be found at gitlab.com/enable-medicine-public/spatsurv.


Assuntos
Neoplasias , Proteômica , Humanos , Neoplasias/diagnóstico por imagem , Comunicação Celular , Progressão da Doença , Análise de Sobrevida
11.
bioRxiv ; 2023 Mar 09.
Artigo em Inglês | MEDLINE | ID: mdl-36945593

RESUMO

Cross-regulation between hormone signaling pathways is indispensable for plant growth and development. However, the molecular mechanisms by which multiple hormones interact and co-ordinate activity need to be understood. Here, we generated a cross-regulation network explaining how hormone signals are integrated from multiple pathways in etiolated Arabidopsis (Arabidopsis thaliana) seedlings. To do so we comprehensively characterized transcription factor activity during plant hormone responses and reconstructed dynamic transcriptional regulatory models for six hormones; abscisic acid, brassinosteroid, ethylene, jasmonic acid, salicylic acid and strigolactone/karrikin. These models incorporated target data for hundreds of transcription factors and thousands of protein-protein interactions. Each hormone recruited different combinations of transcription factors, a subset of which were shared between hormones. Hub target genes existed within hormone transcriptional networks, exhibiting transcription factor activity themselves. In addition, a group of MITOGEN-ACTIVATED PROTEIN KINASES (MPKs) were identified as potential key points of cross-regulation between multiple hormones. Accordingly, the loss of function of one of these (MPK6) disrupted the global proteome, phosphoproteome and transcriptome during hormone responses. Lastly, we determined that all hormones drive substantial alternative splicing that has distinct effects on the transcriptome compared with differential gene expression, acting in early hormone responses. These results provide a comprehensive understanding of the common features of plant transcriptional regulatory pathways and how cross-regulation between hormones acts upon gene expression.

12.
Nucleic Acids Res ; 51(7): e38, 2023 04 24.
Artigo em Inglês | MEDLINE | ID: mdl-36762475

RESUMO

Inference of global gene regulatory networks from omics data is a long-term goal of systems biology. Most methods developed for inferring transcription factor (TF)-gene interactions either relied on a small dataset or used snapshot data which is not suitable for inferring a process that is inherently temporal. Here, we developed a new computational method that combines neural networks and multi-task learning to predict RNA velocity rather than gene expression values. This allows our method to overcome many of the problems faced by prior methods leading to more accurate and more comprehensive set of identified regulatory interactions. Application of our method to atlas scale single cell data from 6 HuBMAP tissues led to several validated and novel predictions and greatly improved on prior methods proposed for this task.


Assuntos
Biologia Computacional , Algoritmos , Redes Reguladoras de Genes , Biologia de Sistemas , Análise de Célula Única , Atlas como Assunto
13.
bioRxiv ; 2023 Dec 19.
Artigo em Inglês | MEDLINE | ID: mdl-38187629

RESUMO

Many popular spatial transcriptomics techniques lack single-cell resolution. Instead, these methods measure the collective gene expression for each location from a mixture of cells, potentially containing multiple cell types. Here, we developed scResolve, a method for recovering single-cell expression profiles from spatial transcriptomics measurements at multi-cellular resolution. scResolve accurately restores expression profiles of individual cells at their locations, which is unattainable from cell type deconvolution. Applications of scResolve on human breast cancer data and human lung disease data demonstrate that scResolve enables cell type-specific differential gene expression analysis between different tissue contexts and accurate identification of rare cell populations. The spatially resolved cellular-level expression profiles obtained through scResolve facilitate more flexible and precise spatial analysis that complements raw multi-cellular level analysis.

14.
Cell Rep Methods ; 2(11): 100332, 2022 11 21.
Artigo em Inglês | MEDLINE | ID: mdl-36452867

RESUMO

Markers are increasingly being used for several high-throughput data analysis and experimental design tasks. Examples include the use of markers for assigning cell types in scRNA-seq studies, for deconvolving bulk gene expression data, and for selecting marker proteins in single-cell spatial proteomics studies. Most marker selection methods focus on differential expression (DE) analysis. Although such methods work well for data with a few non-overlapping marker sets, they are not appropriate for large atlas-size datasets where several cell types and tissues are considered. To address this, we define the phenotype cover (PC) problem for marker selection and present algorithms that can improve the discriminative power of marker sets. Analysis of these sets on several marker-selection tasks suggests that these methods can lead to solutions that accurately distinguish different phenotypes in the data.


Assuntos
Perfilação da Expressão Gênica , Análise de Célula Única , Perfilação da Expressão Gênica/métodos , Análise de Célula Única/métodos , Análise por Conglomerados , Algoritmos , Fenótipo
16.
PLoS Comput Biol ; 18(9): e1010468, 2022 09.
Artigo em Inglês | MEDLINE | ID: mdl-36095011

RESUMO

Studies comparing single cell RNA-Seq (scRNA-Seq) data between conditions mainly focus on differences in the proportion of cell types or on differentially expressed genes. In many cases these differences are driven by changes in cell interactions which are challenging to infer without spatial information. To determine cell-cell interactions that differ between conditions we developed the Cell Interaction Network Inference (CINS) pipeline. CINS combines Bayesian network analysis with regression-based modeling to identify differential cell type interactions and the proteins that underlie them. We tested CINS on a disease case control and on an aging mouse dataset. In both cases CINS correctly identifies cell type interactions and the ligands involved in these interactions improving on prior methods suggested for cell interaction predictions. We performed additional mouse aging scRNA-Seq experiments which further support the interactions identified by CINS.


Assuntos
Perfilação da Expressão Gênica , Análise de Célula Única , Animais , Teorema de Bayes , Comunicação Celular , Perfilação da Expressão Gênica/métodos , Ligantes , Camundongos , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos
17.
Nat Methods ; 19(10): 1306-1319, 2022 10.
Artigo em Inglês | MEDLINE | ID: mdl-36064772

RESUMO

Hematopoietic humanized (hu) mice are powerful tools for modeling the action of human immune system and are widely used for preclinical studies and drug discovery. However, generating a functional human T cell compartment in hu mice remains challenging, primarily due to the species-related differences between human and mouse thymus. While engrafting human fetal thymic tissues can support robust T cell development in hu mice, tissue scarcity and ethical concerns limit their wide use. Here, we describe the tissue engineering of human thymus organoids from inducible pluripotent stem cells (iPSC-thymus) that can support the de novo generation of a diverse population of functional human T cells. T cells of iPSC-thymus-engrafted hu mice could mediate both cellular and humoral immune responses, including mounting robust proinflammatory responses on T cell receptor engagement, inhibiting allogeneic tumor graft growth and facilitating efficient Ig class switching. Our findings indicate that hu mice engrafted with iPSC-thymus can serve as a new animal model to study human T cell-mediated immunity and accelerate the translation of findings from animal studies into the clinic.


Assuntos
Transplante de Células-Tronco Hematopoéticas , Células-Tronco Pluripotentes Induzidas , Animais , Modelos Animais de Doenças , Humanos , Camundongos , Camundongos SCID , Organoides , Linfócitos T , Timo
18.
J Comput Biol ; 29(11): 1229-1232, 2022 11.
Artigo em Inglês | MEDLINE | ID: mdl-36036832

RESUMO

UNIFAN is an unsupervised cell type annotation tool for single-cell RNA sequencing data (scRNA-seq). Given single-cell expression data as input, UNIFAN outputs cell clusters as well as annotations for each cluster. The clustering process utilizes information on pathways and biological processes and these are also used to annotate the resulting clusters. In this software article, we focus on how to install UNIFAN and on the main steps involved in using UNIFAN for cell type annotations.


Assuntos
Perfilação da Expressão Gênica , Análise de Célula Única , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Perfilação da Expressão Gênica/métodos , Análise por Conglomerados , Software
19.
Genome Biol ; 23(1): 150, 2022 07 07.
Artigo em Inglês | MEDLINE | ID: mdl-35799304

RESUMO

We develop scSTEM, single-cell STEM, a method for clustering dynamic profiles of genes in trajectories inferred from pseudotime ordering of single-cell RNA-seq (scRNA-seq) data. scSTEM uses one of several metrics to summarize the expression of genes and assigns a p-value to clusters enabling the identification of significant profiles and comparison of profiles across different paths. Application of scSTEM to several scRNA-seq datasets demonstrates its usefulness and ability to improve downstream analysis of biological processes. scSTEM is available at https://github.com/alexQiSong/scSTEM .


Assuntos
Algoritmos , Análise de Célula Única , Análise por Conglomerados , Perfilação da Expressão Gênica/métodos , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Software
20.
Genome Res ; 2022 Jun 28.
Artigo em Inglês | MEDLINE | ID: mdl-35764397

RESUMO

One of the first steps in the analysis of single-cell RNA sequencing (scRNA-seq) data is the assignment of cell types. Although a number of supervised methods have been developed for this, in most cases such assignment is performed by first clustering cells in low-dimensional space and then assigning cell types to different clusters. To overcome noise and to improve cell type assignments, we developed UNIFAN, a neural network method that simultaneously clusters and annotates cells using known gene sets. UNIFAN combines both low-dimensional representation for all genes and cell-specific gene set activity scores to determine the clustering. We applied UNIFAN to human and mouse scRNA-seq data sets from several different organs. We show, by using knowledge about gene sets, that UNIFAN greatly outperforms prior methods developed for clustering scRNA-seq data. The gene sets assigned by UNIFAN to different clusters provide strong evidence for the cell type that is represented by this cluster, making annotations easier.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...