Your browser doesn't support javascript.
Show: 20 | 50 | 100
Results 1 - 20 de 644
Filter
Add filters

Document Type
Year range
1.
PLoS Comput Biol ; 17(12): e1009675, 2021 12.
Article in English | MEDLINE | ID: covidwho-1619980

ABSTRACT

Identifying the epitope of an antibody is a key step in understanding its function and its potential as a therapeutic. Sequence-based clonal clustering can identify antibodies with similar epitope complementarity, however, antibodies from markedly different lineages but with similar structures can engage the same epitope. We describe a novel computational method for epitope profiling based on structural modelling and clustering. Using the method, we demonstrate that sequence dissimilar but functionally similar antibodies can be found across the Coronavirus Antibody Database, with high accuracy (92% of antibodies in multiple-occupancy structural clusters bind to consistent domains). Our approach functionally links antibodies with distinct genetic lineages, species origins, and coronavirus specificities. This indicates greater convergence exists in the immune responses to coronaviruses than is suggested by sequence-based approaches. Our results show that applying structural analytics to large class-specific antibody databases will enable high confidence structure-function relationships to be drawn, yielding new opportunities to identify functional convergence hitherto missed by sequence-only analysis.


Subject(s)
Antigens, Viral/chemistry , COVID-19/immunology , COVID-19/virology , Epitopes, B-Lymphocyte/chemistry , SARS-CoV-2/chemistry , SARS-CoV-2/immunology , Amino Acid Sequence , Animals , Antibodies, Neutralizing/chemistry , Antibodies, Neutralizing/genetics , Antibodies, Viral/chemistry , Antibodies, Viral/genetics , Antibodies, Viral/metabolism , Antibody Specificity , Antigen-Antibody Complex/chemistry , Antigen-Antibody Complex/genetics , Antigen-Antibody Reactions/genetics , Antigen-Antibody Reactions/immunology , Computational Biology , Coronavirus/chemistry , Coronavirus/genetics , Coronavirus/immunology , Databases, Chemical , Epitope Mapping , Epitopes, B-Lymphocyte/genetics , Humans , Mice , Models, Molecular , Pandemics , SARS-CoV-2/genetics , Single-Domain Antibodies/immunology
2.
Int J Mol Sci ; 23(1)2022 Jan 04.
Article in English | MEDLINE | ID: covidwho-1613825

ABSTRACT

(1R,5S)-1-Hydroxy-3,6-dioxa-bicyclo[3.2.1]octan-2-one, available by an efficient catalytic pyrolysis of cellulose, has been applied as a chiral building block in the synthesis of seven new nucleoside analogues, with structural modifications on the nucleobase moiety and on the carboxyl- derived unit. The inverted configuration by Mitsunobu reaction used in their synthesis was verified by 2D-NOESY correlations, supported by the optimized structure employing the DFT methods. An in silico screening of these compounds as inhibitors of SARS-CoV-2 RNA-dependent RNA polymerase has been carried out in comparison with both remdesivir, a mono-phosphoramidate prodrug recently approved for COVID-19 treatment, and its ribonucleoside metabolite GS-441524. Drug-likeness prediction and data by docking calculation indicated compound 6 [=(3S,5S)-methyl 5-(hydroxymethyl)-3-(6-(4-methylpiperazin-1-yl)-9H-purin-9-yl)tetrahydrofuran-3-carboxylate] as the best candidate. Furthermore, molecular dynamics simulation showed a stable interaction of structure 6 in RNA-dependent RNA polymerase (RdRp) complex and a lower average atomic fluctuation than GS-441524, suggesting a well accommodation in the RdRp binding pocket.


Subject(s)
Antiviral Agents/chemical synthesis , Cellulose/chemistry , Coronavirus RNA-Dependent RNA Polymerase/antagonists & inhibitors , Nucleosides/chemical synthesis , SARS-CoV-2/enzymology , Adenosine/analogs & derivatives , Adenosine/chemistry , Adenosine/pharmacokinetics , Adenosine Monophosphate/analogs & derivatives , Adenosine Monophosphate/chemistry , Adenosine Monophosphate/pharmacokinetics , Alanine/analogs & derivatives , Alanine/chemistry , Alanine/pharmacokinetics , Antiviral Agents/chemistry , Antiviral Agents/pharmacokinetics , Computational Biology , Coronavirus RNA-Dependent RNA Polymerase/chemistry , Molecular Docking Simulation , Molecular Dynamics Simulation , Nucleosides/chemistry , Nucleosides/pharmacokinetics , Pyrolysis , SARS-CoV-2/drug effects
3.
AAPS J ; 24(1): 19, 2022 01 04.
Article in English | MEDLINE | ID: covidwho-1605878

ABSTRACT

Over the past decade, artificial intelligence (AI) and machine learning (ML) have become the breakthrough technology most anticipated to have a transformative effect on pharmaceutical research and development (R&D). This is partially driven by revolutionary advances in computational technology and the parallel dissipation of previous constraints to the collection/processing of large volumes of data. Meanwhile, the cost of bringing new drugs to market and to patients has become prohibitively expensive. Recognizing these headwinds, AI/ML techniques are appealing to the pharmaceutical industry due to their automated nature, predictive capabilities, and the consequent expected increase in efficiency. ML approaches have been used in drug discovery over the past 15-20 years with increasing sophistication. The most recent aspect of drug development where positive disruption from AI/ML is starting to occur, is in clinical trial design, conduct, and analysis. The COVID-19 pandemic may further accelerate utilization of AI/ML in clinical trials due to an increased reliance on digital technology in clinical trial conduct. As we move towards a world where there is a growing integration of AI/ML into R&D, it is critical to get past the related buzz-words and noise. It is equally important to recognize that the scientific method is not obsolete when making inferences about data. Doing so will help in separating hope from hype and lead to informed decision-making on the optimal use of AI/ML in drug development. This manuscript aims to demystify key concepts, present use-cases and finally offer insights and a balanced view on the optimal use of AI/ML methods in R&D.


Subject(s)
Artificial Intelligence , Clinical Trials as Topic , Computational Biology , Drug Development , Machine Learning , Pharmaceutical Research , Research Design , Animals , Artificial Intelligence/trends , Computational Biology/trends , Diffusion of Innovation , Drug Development/trends , Forecasting , Humans , Machine Learning/trends , Pharmaceutical Research/trends , Research Design/trends
4.
PLoS One ; 16(12): e0262056, 2021.
Article in English | MEDLINE | ID: covidwho-1596737

ABSTRACT

Characterization of protein complexes, i.e. sets of proteins assembling into a single larger physical entity, is important, as such assemblies play many essential roles in cells such as gene regulation. From networks of protein-protein interactions, potential protein complexes can be identified computationally through the application of community detection methods, which flag groups of entities interacting with each other in certain patterns. Most community detection algorithms tend to be unsupervised and assume that communities are dense network subgraphs, which is not always true, as protein complexes can exhibit diverse network topologies. The few existing supervised machine learning methods are serial and can potentially be improved in terms of accuracy and scalability by using better-suited machine learning models and parallel algorithms. Here, we present Super.Complex, a distributed, supervised AutoML-based pipeline for overlapping community detection in weighted networks. We also propose three new evaluation measures for the outstanding issue of comparing sets of learned and known communities satisfactorily. Super.Complex learns a community fitness function from known communities using an AutoML method and applies this fitness function to detect new communities. A heuristic local search algorithm finds maximally scoring communities, and a parallel implementation can be run on a computer cluster for scaling to large networks. On a yeast protein-interaction network, Super.Complex outperforms 6 other supervised and 4 unsupervised methods. Application of Super.Complex to a human protein-interaction network with ~8k nodes and ~60k edges yields 1,028 protein complexes, with 234 complexes linked to SARS-CoV-2, the COVID-19 virus, with 111 uncharacterized proteins present in 103 learned complexes. Super.Complex is generalizable with the ability to improve results by incorporating domain-specific features. Learned community characteristics can also be transferred from existing applications to detect communities in a new application with no known communities. Code and interactive visualizations of learned human protein complexes are freely available at: https://sites.google.com/view/supercomplex/super-complex-v3-0.


Subject(s)
Computational Biology/methods , Protein Interaction Maps , Proteins/immunology , Supervised Machine Learning , Viral Proteins/immunology , COVID-19/immunology , Humans , Protein Binding , Protein Interaction Mapping , SARS-CoV-2/immunology
5.
Genet Res (Camb) ; 2021: 2728757, 2021.
Article in English | MEDLINE | ID: covidwho-1593203

ABSTRACT

Coronavirus disease 2019 (COVID-19) is acutely infectious pneumonia. Currently, the specific causes and treatment targets of COVID-19 are still unclear. Herein, comprehensive bioinformatics methods were employed to analyze the hub genes in COVID-19 and tried to reveal its potential mechanisms. First of all, 34 groups of COVID-19 lung tissues and 17 other diseases' lung tissues were selected from the GSE151764 gene expression profile for research. According to the analysis of the DEGs (differentially expressed genes) in the samples using the limma software package, 84 upregulated DEGs and 46 downregulated DEGs were obtained. Later, by the Database for Annotation, Visualization, and Integrated Discovery (DAVID), they were enriched in the Gene Ontology (GO) terms and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. It was found that the upregulated DEGs were enriched in the type I interferon signaling pathway, AGE-RAGE signaling pathway in diabetic complications, coronavirus disease, etc. Downregulated DEGs were in cellular response to cytokine stimulus, IL-17 signaling pathway, FoxO signaling pathway, etc. Then, based on GSEA, the enrichment of the gene set in the sample was analyzed in the GO terms, and the gene set was enriched in the positive regulation of myeloid leukocyte cytokine production involved in immune response, programmed necrotic cell death, translesion synthesis, necroptotic process, and condensed nuclear chromosome. Finally, with the help of STRING tools, the PPI (protein-protein interaction) network diagrams of DEGs were constructed. With degree ≥13 as the cutoff degree, 3 upregulated hub genes (ISG15, FN1, and HLA-G) and 4 downregulated hub genes (FOXP3, CXCR4, MMP9, and CD69) were screened out for high degree. All these findings will help us to understand the potential molecular mechanisms of COVID-19, which is also of great significance for its diagnosis and prevention.


Subject(s)
COVID-19 , Computational Biology , Gene Expression Profiling , Humans , SARS-CoV-2 , Signal Transduction , Transcriptome
6.
Mol Med ; 27(1): 161, 2021 12 20.
Article in English | MEDLINE | ID: covidwho-1582119

ABSTRACT

BACKGROUND: Similarities in the hijacking mechanisms used by SARS-CoV-2 and several types of cancer, suggest the repurposing of cancer drugs to treat Covid-19. CK2 kinase antagonists have been proposed for cancer treatment. A recent study in cells infected with SARS-CoV-2 found a significant CK2 kinase activity, and the use of a CK2 inhibitor showed antiviral responses. CIGB-300, originally designed as an anticancer peptide, is an antagonist of CK2 kinase activity that binds to the CK2 phospho-acceptor sites. Recent preliminary results show the antiviral activity of CIGB-300 using a surrogate model of coronavirus. Here we present a computational biology study that provides evidence, at the molecular level, of how CIGB-300 may interfere with the SARS-CoV-2 life cycle within infected human cells. METHODS: Sequence analyses and data from phosphorylation studies were combined to predict infection-induced molecular mechanisms that can be interfered by CIGB-300. Next, we integrated data from multi-omics studies and data focusing on the antagonistic effect on the CK2 kinase activity of CIGB-300. A combination of network and functional enrichment analyses was used. RESULTS: Firstly, from the SARS-CoV studies, we inferred the potential incidence of CIGB-300 in SARS-CoV-2 interference on the immune response. Afterwards, from the analysis of multiple omics data, we proposed the action of CIGB-300 from the early stages of viral infections perturbing the virus hijacking of RNA splicing machinery. We also predicted the interference of CIGB-300 in virus-host interactions that are responsible for the high infectivity and the particular immune response to SARS-CoV-2 infection. Furthermore, we provided evidence of how CIGB-300 may participate in the attenuation of phenotypes related to muscle, bleeding, coagulation and respiratory disorders. CONCLUSIONS: Our computational analysis proposes putative molecular mechanisms that support the antiviral activity of CIGB-300.


Subject(s)
COVID-19/metabolism , Computational Biology/methods , Animals , COVID-19/drug therapy , Caco-2 Cells , Chlorocebus aethiops , Humans , Nuclear Pore Complex Proteins/therapeutic use , Peptides, Cyclic/therapeutic use , SARS-CoV-2/drug effects , SARS-CoV-2/pathogenicity , Vero Cells
7.
Eur J Med Res ; 26(1): 146, 2021 Dec 17.
Article in English | MEDLINE | ID: covidwho-1582003

ABSTRACT

BACKGROUND: At the end of 2019, the world witnessed the emergence and ravages of a viral infection induced by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Also known as the coronavirus disease 2019 (COVID-19), it has been identified as a public health emergency of international concern (PHEIC) by the World Health Organization (WHO) because of its severity. METHODS: The gene data of 51 samples were extracted from the GSE150316 and GSE147507 data set and then processed by means of the programming language R, through which the differentially expressed genes (DEGs) that meet the standards were screened. The Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses were performed on the selected DEGs to understand the functions and approaches of DEGs. The online tool STRING was employed to construct a protein-protein interaction (PPI) network of DEGs and, in turn, to identify hub genes. RESULTS: A total of 52 intersection genes were obtained through DEG identification. Through the GO analysis, we realized that the biological processes (BPs) that have the deepest impact on the human body after SARS-CoV-2 infection are various immune responses. By using STRING to construct a PPI network, 10 hub genes were identified, including IFIH1, DDX58, ISG15, EGR1, OASL, SAMD9, SAMD9L, XAF1, IFITM1, and TNFSF10. CONCLUSION: The results of this study will hopefully provide guidance for future studies on the pathophysiological mechanism of SARS-CoV-2 infection.


Subject(s)
COVID-19/genetics , Computational Biology/methods , Gene Expression Regulation/genetics , Lung/pathology , Protein Interaction Maps/genetics , COVID-19/pathology , Databases, Genetic , Gene Expression Profiling , Gene Ontology , Humans , Immunity, Humoral/genetics , Immunity, Humoral/immunology , Lung/virology , Neutrophil Activation/genetics , Neutrophil Activation/immunology , Neutrophils/immunology , SARS-CoV-2 , Transcriptome/genetics
8.
PLoS Comput Biol ; 17(12): e1009629, 2021 12.
Article in English | MEDLINE | ID: covidwho-1581906

ABSTRACT

Identifying order of symptom onset of infectious diseases might aid in differentiating symptomatic infections earlier in a population thereby enabling non-pharmaceutical interventions and reducing disease spread. Previously, we developed a mathematical model predicting the order of symptoms based on data from the initial outbreak of SARS-CoV-2 in China using symptom occurrence at diagnosis and found that the order of COVID-19 symptoms differed from that of other infectious diseases including influenza. Whether this order of COVID-19 symptoms holds in the USA under changing conditions is unclear. Here, we use modeling to predict the order of symptoms using data from both the initial outbreaks in China and in the USA. Whereas patients in China were more likely to have fever before cough and then nausea/vomiting before diarrhea, patients in the USA were more likely to have cough before fever and then diarrhea before nausea/vomiting. Given that the D614G SARS-CoV-2 variant that rapidly spread from Europe to predominate in the USA during the first wave of the outbreak was not present in the initial China outbreak, we hypothesized that this mutation might affect symptom order. Supporting this notion, we found that as SARS-CoV-2 in Japan shifted from the original Wuhan reference strain to the D614G variant, symptom order shifted to the USA pattern. Google Trends analyses supported these findings, while weather, age, and comorbidities did not affect our model's predictions of symptom order. These findings indicate that symptom order can change with mutation in viral disease and raise the possibility that D614G variant is more transmissible because infected people are more likely to cough in public before being incapacitated with fever.


Subject(s)
COVID-19/diagnosis , COVID-19/virology , Models, Biological , SARS-CoV-2 , COVID-19/epidemiology , China/epidemiology , Computational Biology , Cough/etiology , Diarrhea/etiology , Fever/etiology , Humans , Japan/epidemiology , Mutation , Nausea/etiology , Pandemics , SARS-CoV-2/genetics , SARS-CoV-2/pathogenicity , Time Factors , United States/epidemiology , Vomiting/etiology
9.
Front Immunol ; 12: 774776, 2021.
Article in English | MEDLINE | ID: covidwho-1581334

ABSTRACT

Both RNA N6-methyladenosine (m6A) modification of SARS-CoV-2 and immune characteristics of the human body have been reported to play an important role in COVID-19, but how the m6A methylation modification of leukocytes responds to the virus infection remains unknown. Based on the RNA-seq of 126 samples from the GEO database, we disclosed that there is a remarkably higher m6A modification level of blood leukocytes in patients with COVID-19 compared to patients without COVID-19, and this difference was related to CD4+ T cells. Two clusters were identified by unsupervised clustering, m6A cluster A characterized by T cell activation had a higher prognosis than m6A cluster B. Elevated metabolism level, blockage of the immune checkpoint, and lower level of m6A score were observed in m6A cluster B. A protective model was constructed based on nine selected genes and it exhibited an excellent predictive value in COVID-19. Further analysis revealed that the protective score was positively correlated to HFD45 and ventilator-free days, while negatively correlated to SOFA score, APACHE-II score, and crp. Our works systematically depicted a complicated correlation between m6A methylation modification and host lymphocytes in patients infected with SARS-CoV-2 and provided a well-performing model to predict the patients' outcomes.


Subject(s)
Adenosine/analogs & derivatives , COVID-19/immunology , COVID-19/virology , Host-Pathogen Interactions/immunology , Leukocytes/immunology , RNA, Viral/genetics , SARS-CoV-2/physiology , Adenosine/metabolism , Cluster Analysis , Computational Biology/methods , Disease Susceptibility/immunology , Gene Expression Profiling , Humans , Leukocytes/metabolism , RNA, Viral/metabolism , ROC Curve
10.
Molecules ; 27(1)2021 Dec 30.
Article in English | MEDLINE | ID: covidwho-1580564

ABSTRACT

The COVID-19 pandemic has caused millions of fatalities since 2019. Despite the availability of vaccines for this disease, new strains are causing rapid ailment and are a continuous threat to vaccine efficacy. Here, molecular docking and simulations identify strong inhibitors of the allosteric site of the SARS-CoV-2 virus RNA dependent RNA polymerase (RdRp). More than one hundred different flavonoids were docked with the SARS-CoV-2 RdRp allosteric site through computational screening. The three top hits were Naringoside, Myricetin and Aureusidin 4,6-diglucoside. Simulation analyses confirmed that they are in constant contact during the simulation time course and have strong association with the enzyme's allosteric site. Absorption, distribution, metabolism, excretion and toxicity (ADMET) data provided medicinal information of these top three hits. They had good human intestinal absorption (HIA) concentrations and were non-toxic. Due to high mutation rates in the active sites of the viral enzyme, these new allosteric site inhibitors offer opportunities to drug SARS-CoV-2 RdRp. These results provide new information for the design of novel allosteric inhibitors against SARS-CoV-2 RdRp.


Subject(s)
Antiviral Agents/pharmacology , COVID-19/drug therapy , Computational Biology/methods , Coronavirus RNA-Dependent RNA Polymerase/antagonists & inhibitors , Drug Evaluation, Preclinical , Flavonoids/pharmacology , SARS-CoV-2/enzymology , Allosteric Site , COVID-19/virology , Catalytic Domain , Drug Design , Humans , Intestinal Absorption , Molecular Docking Simulation
11.
PLoS Comput Biol ; 17(12): e1009697, 2021 12.
Article in English | MEDLINE | ID: covidwho-1571974

ABSTRACT

For the control of COVID-19, vaccination programmes provide a long-term solution. The amount of available vaccines is often limited, and thus it is crucial to determine the allocation strategy. While mathematical modelling approaches have been used to find an optimal distribution of vaccines, there is an excessively large number of possible allocation schemes to be simulated. Here, we propose an algorithm to find a near-optimal allocation scheme given an intervention objective such as minimization of new infections, hospitalizations, or deaths, where multiple vaccines are available. The proposed principle for allocating vaccines is to target subgroups with the largest reduction in the outcome of interest. We use an approximation method to reconstruct the age-specific transmission intensity (the next generation matrix), and express the expected impact of vaccinating each subgroup in terms of the observed incidence of infection and force of infection. The proposed approach is firstly evaluated with a simulated epidemic and then applied to the epidemiological data on COVID-19 in the Netherlands. Our results reveal how the optimal allocation depends on the objective of infection control. In the case of COVID-19, if we wish to minimize deaths, the optimal allocation strategy is not efficient for minimizing other outcomes, such as infections. In simulated epidemics, an allocation strategy optimized for an outcome outperforms other strategies such as the allocation from young to old, from old to young, and at random. Our simulations clarify that the current policy in the Netherlands (i.e., allocation from old to young) was concordant with the allocation scheme that minimizes deaths. The proposed method provides an optimal allocation scheme, given routine surveillance data that reflect ongoing transmissions. This approach to allocation is useful for providing plausible simulation scenarios for complex models, which give a more robust basis to determine intervention strategies.


Subject(s)
Algorithms , COVID-19 Vaccines/therapeutic use , COVID-19/prevention & control , SARS-CoV-2 , Vaccination/methods , Age Factors , COVID-19/epidemiology , COVID-19/immunology , COVID-19 Vaccines/supply & distribution , Computational Biology , Computer Simulation , Health Care Rationing/methods , Health Care Rationing/statistics & numerical data , Humans , Mass Vaccination/methods , Mass Vaccination/statistics & numerical data , Netherlands/epidemiology , Pandemics/prevention & control , Pandemics/statistics & numerical data , SARS-CoV-2/immunology , Vaccination/statistics & numerical data
12.
PLoS Comput Biol ; 17(12): e1009664, 2021 12.
Article in English | MEDLINE | ID: covidwho-1571973

ABSTRACT

The evolution of circulating viruses is shaped by their need to evade antibody response, which mainly targets the viral spike. Because of the high density of spikes on the viral surface, not all antigenic sites are targeted equally by antibodies. We offer here a geometry-based approach to predict and rank the probability of surface residues of SARS spike (S protein) and influenza H1N1 spike (hemagglutinin) to acquire antibody-escaping mutations utilizing in-silico models of viral structure. We used coarse-grained MD simulations to estimate the on-rate (targeting) of an antibody model to surface residues of the spike protein. Analyzing publicly available sequences, we found that spike surface sequence diversity of the pre-pandemic seasonal influenza H1N1 and the sarbecovirus subgenus highly correlates with our model prediction of antibody targeting. In particular, we identified an antibody-targeting gradient, which matches a mutability gradient along the main axis of the spike. This identifies the role of viral surface geometry in shaping the evolution of circulating viruses. For the 2009 H1N1 and SARS-CoV-2 pandemics, a mutability gradient along the main axis of the spike was not observed. Our model further allowed us to identify key residues of the SARS-CoV-2 spike at which antibody escape mutations have now occurred. Therefore, it can inform of the likely functional role of observed mutations and predict at which residues antibody-escaping mutation might arise.


Subject(s)
Evolution, Molecular , Influenza A Virus, H1N1 Subtype/genetics , Influenza A Virus, H1N1 Subtype/immunology , SARS-CoV-2/genetics , SARS-CoV-2/immunology , Spike Glycoprotein, Coronavirus/genetics , Spike Glycoprotein, Coronavirus/immunology , Viral Envelope Proteins/genetics , Viral Envelope Proteins/immunology , Animals , Antibodies, Viral/biosynthesis , Antigens, Viral/chemistry , Antigens, Viral/genetics , COVID-19/epidemiology , COVID-19/immunology , COVID-19/virology , Computational Biology , Coronavirus Infections/immunology , Coronavirus Infections/virology , Epitopes, B-Lymphocyte/chemistry , Epitopes, B-Lymphocyte/genetics , Hemagglutinin Glycoproteins, Influenza Virus/chemistry , Hemagglutinin Glycoproteins, Influenza Virus/genetics , Hemagglutinin Glycoproteins, Influenza Virus/immunology , Host Microbial Interactions/genetics , Host Microbial Interactions/immunology , Humans , Immune Evasion/genetics , Influenza, Human/immunology , Influenza, Human/virology , Models, Immunological , Molecular Dynamics Simulation , Mutation , Pandemics , Spike Glycoprotein, Coronavirus/chemistry , Viral Envelope Proteins/chemistry
13.
BMC Med Genomics ; 14(Suppl 6): 289, 2021 12 14.
Article in English | MEDLINE | ID: covidwho-1571758

ABSTRACT

BACKGROUND: Virus screening and viral genome reconstruction are urgent and crucial for the rapid identification of viral pathogens, i.e., tracing the source and understanding the pathogenesis when a viral outbreak occurs. Next-generation sequencing (NGS) provides an efficient and unbiased way to identify viral pathogens in host-associated and environmental samples without prior knowledge. Despite the availability of software, data analysis still requires human operations. A mature pipeline is urgently needed when thousands of viral pathogen and viral genome reconstruction samples need to be rapidly identified. RESULTS: In this paper, we present a rapid and accurate workflow to screen metagenomics sequencing data for viral pathogens and other compositions, as well as enable a reference-based assembler to reconstruct viral genomes. Moreover, we tested our workflow on several metagenomics datasets, including a SARS-CoV-2 patient sample with NGS data, pangolins tissues with NGS data, Middle East Respiratory Syndrome (MERS)-infected cells with NGS data, etc. Our workflow demonstrated high accuracy and efficiency when identifying target viruses from large scale NGS metagenomics data. Our workflow was flexible when working with a broad range of NGS datasets from small (kb) to large (100 Gb). This took from a few minutes to a few hours to complete each task. At the same time, our workflow automatically generates reports that incorporate visualized feedback (e.g., metagenomics data quality statistics, host and viral sequence compositions, details about each of the identified viral pathogens and their coverages, and reassembled viral pathogen sequences based on their closest references). CONCLUSIONS: Overall, our system enabled the rapid screening and identification of viral pathogens from metagenomics data, providing an important piece to support viral pathogen research during a pandemic. The visualized report contains information from raw sequence quality to a reconstructed viral sequence, which allows non-professional people to screen their samples for viruses by themselves (Additional file 1).


Subject(s)
COVID-19 Testing/methods , COVID-19/diagnosis , Computational Biology/methods , Genome, Viral , Genomics , Metagenomics , SARS-CoV-2/genetics , Algorithms , Animals , Automation , Coronavirus Infections/genetics , High-Throughput Nucleotide Sequencing , Humans , Mass Screening/methods , Pandemics , Pangolins , Reference Values , Software , Transcriptome , Workflow
14.
Viruses ; 13(12)2021 12 03.
Article in English | MEDLINE | ID: covidwho-1554806

ABSTRACT

SARS-CoV-2 genomic sequencing efforts have scaled dramatically to address the current global pandemic and aid public health. However, autonomous genome annotation of SARS-CoV-2 genes, proteins, and domains is not readily accomplished by existing methods and results in missing or incorrect sequences. To overcome this limitation, we developed a novel semi-supervised pipeline for automated gene, protein, and functional domain annotation of SARS-CoV-2 genomes that differentiates itself by not relying on the use of a single reference genome and by overcoming atypical genomic traits that challenge traditional bioinformatic methods. We analyzed an initial corpus of 66,000 SARS-CoV-2 genome sequences collected from labs across the world using our method and identified the comprehensive set of known proteins with 98.5% set membership accuracy and 99.1% accuracy in length prediction, compared to proteome references, including Replicase polyprotein 1ab (with its transcriptional slippage site). Compared to other published tools, such as Prokka (base) and VAPiD, we yielded a 6.4- and 1.8-fold increase in protein annotations. Our method generated 13,000,000 gene, protein, and domain sequences-some conserved across time and geography and others representing emerging variants. We observed 3362 non-redundant sequences per protein on average within this corpus and described key D614G and N501Y variants spatiotemporally in the initial genome corpus. For spike glycoprotein domains, we achieved greater than 97.9% sequence identity to references and characterized receptor binding domain variants. We further demonstrated the robustness and extensibility of our method on an additional 4000 variant diverse genomes containing all named variants of concern and interest as of August 2021. In this cohort, we successfully identified all keystone spike glycoprotein mutations in our predicted protein sequences with greater than 99% accuracy as well as demonstrating high accuracy of the protein and domain annotations. This work comprehensively presents the molecular targets to refine biomedical interventions for SARS-CoV-2 with a scalable, high-accuracy method to analyze newly sequenced infections as they arise.


Subject(s)
COVID-19/virology , Genome, Viral , Molecular Sequence Annotation , SARS-CoV-2/genetics , Amino Acid Sequence , Base Sequence , Computational Biology , Humans , Mutation , Protein Binding , Protein Domains , Spike Glycoprotein, Coronavirus/genetics
15.
Int J Mol Sci ; 22(24)2021 Dec 07.
Article in English | MEDLINE | ID: covidwho-1554804

ABSTRACT

In the last few years, microRNA-mediated regulation has been shown to be important in viral infections. In fact, viral microRNAs can alter cell physiology and act on the immune system; moreover, cellular microRNAs can regulate the virus cycle, influencing positively or negatively viral replication. Accordingly, microRNAs can represent diagnostic and prognostic biomarkers of infectious processes and a promising approach for designing targeted therapies. In the past 18 months, the COVID-19 infection from SARS-CoV-2 has engaged many researchers in the search for diagnostic and prognostic markers and the development of therapies. Although some research suggests that the SARS-CoV-2 genome can produce microRNAs and that host microRNAs may be involved in the cellular response to the virus, to date, not enough evidence has been provided. In this paper, using a focused bioinformatic approach exploring the SARS-CoV-2 genome, we propose that SARS-CoV-2 is able to produce microRNAs sharing a strong sequence homology with the human ones and also that human microRNAs may target viral RNA regulating the virus life cycle inside human cells. Interestingly, all viral miRNA sequences and some human miRNA target sites are conserved in more recent SARS-CoV-2 variants of concern (VOCs). Even if experimental evidence will be needed, in silico analysis represents a valuable source of information useful to understand the sophisticated molecular mechanisms of disease and to sustain biomedical applications.


Subject(s)
MicroRNAs/genetics , SARS-CoV-2/genetics , Virus Replication/genetics , COVID-19/genetics , Computational Biology/methods , DNA Viruses/genetics , Gene Expression/genetics , Gene Expression Regulation, Viral/genetics , Genome, Viral/genetics , Host-Pathogen Interactions/genetics , RNA, Viral/genetics , Sequence Homology
16.
OMICS ; 25(11): 681-692, 2021 11.
Article in English | MEDLINE | ID: covidwho-1541502

ABSTRACT

Multiomics study designs have significantly increased understanding of complex biological systems. The multiomics literature is rapidly expanding and so is their heterogeneity. However, the intricacy and fragmentation of omics data are impeding further research. To examine current trends in multiomics field, we reviewed 52 articles from PubMed and Web of Science, which used an integrated omics approach, published between March 2006 and January 2021. From studies, data regarding investigated loci, species, omics type, and phenotype were extracted, curated, and streamlined according to standardized terminology, and summarized in a previously developed graphical summary. Evaluated studies included 21 omics types or applications of omics technology such as genomics, transcriptomics, metabolomics, epigenomics, environmental omics, and pharmacogenomics, species of various phyla including human, mouse, Arabidopsis thaliana, Saccharomyces cerevisiae, and various phenotypes, including cancer and COVID-19. In the analyzed studies, diverse methods, protocols, results, and terminology were used and accordingly, assessment of the studies was challenging. Adoption of standardized multiomics data presentation in the future will further buttress standardization of terminology and reporting of results in systems science. This shall catalyze, we suggest, innovation in both science communication and laboratory medicine by making available scientific knowledge that is easier to grasp, share, and harness toward medical breakthroughs.


Subject(s)
Computational Biology/trends , Genomics/trends , Metabolomics/trends , Proteomics/trends , Animals , COVID-19 , Computer Graphics , Epigenomics/trends , Gene Expression Profiling/trends , Humans , Pharmacogenetics/trends , Publications , SARS-CoV-2 , Terminology as Topic
17.
Comput Math Methods Med ; 2021: 7259414, 2021.
Article in English | MEDLINE | ID: covidwho-1533111

ABSTRACT

In this paper, based on the improved convolutional neural network, in-depth analysis of the CT image of the new coronary pneumonia, using the U-Net series of deep neural networks to semantically segment the CT image of the new coronary pneumonia, to obtain the new coronary pneumonia area as the foreground and the remaining areas as the background of the binary image, provides a basis for subsequent image diagnosis. Secondly, the target-detection framework Faster RCNN extracts features from the CT image of the new coronary pneumonia tumor, obtains a higher-level abstract representation of the data, determines the lesion location of the new coronary pneumonia tumor, and gives its bounding box in the image. By generating an adversarial network to diagnose the lesion area of the CT image of the new coronary pneumonia tumor, obtaining a complete image of the new coronary pneumonia, achieving the effect of the CT image diagnosis of the new coronary pneumonia tumor, and three-dimensionally reconstructing the complete new coronary pneumonia model, filling the current the gap in this aspect, provide a basis to produce new coronary pneumonia prosthesis and improve the accuracy of diagnosis.


Subject(s)
Algorithms , COVID-19/diagnostic imaging , Neural Networks, Computer , Tomography, X-Ray Computed/statistics & numerical data , COVID-19/diagnosis , Computational Biology , Databases, Factual , Deep Learning , Diagnosis, Computer-Assisted/statistics & numerical data , Humans , Imaging, Three-Dimensional/statistics & numerical data , Pandemics , Radiographic Image Interpretation, Computer-Assisted/statistics & numerical data , SARS-CoV-2
18.
Brief Bioinform ; 22(5)2021 09 02.
Article in English | MEDLINE | ID: covidwho-1528156

ABSTRACT

The low capture rate of expressed RNAs from single-cell sequencing technology is one of the major obstacles to downstream functional genomics analyses. Recently, a number of imputation methods have emerged for single-cell transcriptome data, however, recovering missing values in very sparse expression matrices remains a substantial challenge. Here, we propose a new algorithm, WEDGE (WEighted Decomposition of Gene Expression), to impute gene expression matrices by using a biased low-rank matrix decomposition method. WEDGE successfully recovered expression matrices, reproduced the cell-wise and gene-wise correlations and improved the clustering of cells, performing impressively for applications with sparse datasets. Overall, this study shows a potent approach for imputing sparse expression matrix data, and our WEDGE algorithm should help many researchers to more profitably explore the biological meanings embedded in their single-cell RNA sequencing datasets. The source code of WEDGE has been released at https://github.com/QuKunLab/WEDGE.


Subject(s)
Algorithms , Computational Biology/methods , Gene Expression Profiling/methods , RNA-Seq/methods , Single-Cell Analysis/methods , COVID-19/blood , COVID-19/genetics , COVID-19/virology , Cluster Analysis , Computer Simulation , Genomics/methods , Humans , Leukocytes, Mononuclear/classification , Leukocytes, Mononuclear/metabolism , Reproducibility of Results , SARS-CoV-2/physiology , Severity of Illness Index
19.
PLoS Comput Biol ; 17(11): e1009560, 2021 11.
Article in English | MEDLINE | ID: covidwho-1523396

ABSTRACT

Severe acute respiratory coronavirus 2 (SARS-CoV-2), the causative agent of COVID-19, is of zoonotic origin. Evolutionary analyses assessing whether coronaviruses similar to SARS-CoV-2 infected ancestral species of modern-day animal hosts could be useful in identifying additional reservoirs of potentially dangerous coronaviruses. We reasoned that if a clade of species has been repeatedly exposed to a virus, then their proteins relevant for viral entry may exhibit adaptations that affect host susceptibility or response. We perform comparative analyses across the mammalian phylogeny of angiotensin-converting enzyme 2 (ACE2), the cellular receptor for SARS-CoV-2, in order to uncover evidence for selection acting at its binding interface with the SARS-CoV-2 spike protein. We uncover that in rodents there is evidence for adaptive amino acid substitutions at positions comprising the ACE2-spike interaction interface, whereas the variation within ACE2 proteins in primates and some other mammalian clades is not consistent with evolutionary adaptations. We also analyze aminopeptidase N (APN), the receptor for the human coronavirus 229E, a virus that causes the common cold, and find evidence for adaptation in primates. Altogether, our results suggest that the rodent and primate lineages may have had ancient exposures to viruses similar to SARS-CoV-2 and HCoV-229E, respectively.


Subject(s)
COVID-19/genetics , COVID-19/virology , Coronavirus Infections/genetics , Coronavirus Infections/virology , SARS-CoV-2/genetics , Adaptation, Physiological/genetics , Amino Acid Substitution , Angiotensin-Converting Enzyme 2/genetics , Angiotensin-Converting Enzyme 2/physiology , Animals , CD13 Antigens/genetics , CD13 Antigens/physiology , Common Cold/genetics , Common Cold/virology , Computational Biology , Coronavirus 229E, Human/genetics , Coronavirus 229E, Human/physiology , Evolution, Molecular , Genomics , Host Microbial Interactions/genetics , Host Microbial Interactions/physiology , Host Specificity/genetics , Host Specificity/physiology , Humans , Mammals/genetics , Mammals/virology , Phylogeny , Protein Interaction Domains and Motifs/genetics , Receptors, Virus/genetics , Receptors, Virus/physiology , SARS-CoV-2/physiology , Selection, Genetic , Spike Glycoprotein, Coronavirus/genetics , Spike Glycoprotein, Coronavirus/physiology , Virus Internalization
20.
Genome ; 64(4): v-vii, 2021 Apr.
Article in English | MEDLINE | ID: covidwho-1523064
SELECTION OF CITATIONS
SEARCH DETAIL
...