Search | VHL Regional Portal

ProteInfer, deep neural networks for protein functional inference.

Sanderson, Theo; Bileschi, Maxwell L; Belanger, David; Colwell, Lucy J.

Elife ; 122023 02 27.

Article in English | MEDLINE | ID: mdl-36847334

ABSTRACT

Predicting the function of a protein from its amino acid sequence is a long-standing challenge in bioinformatics. Traditional approaches use sequence alignment to compare a query sequence either to thousands of models of protein families or to large databases of individual protein sequences. Here we introduce ProteInfer, which instead employs deep convolutional neural networks to directly predict a variety of protein functions - Enzyme Commission (EC) numbers and Gene Ontology (GO) terms - directly from an unaligned amino acid sequence. This approach provides precise predictions which complement alignment-based methods, and the computational efficiency of a single neural network permits novel and lightweight software interfaces, which we demonstrate with an in-browser graphical interface for protein function prediction in which all computation is performed on the user's personal computer with no data uploaded to remote servers. Moreover, these models place full-length amino acid sequences into a generalised functional space, facilitating downstream analysis and interpretation. To read the interactive version of this paper, please visit https://google-research.github.io/proteinfer/.

Subject(s)

Algorithms , Neural Networks, Computer , Proteins/genetics , Proteins/chemistry , Amino Acid Sequence , Software , Computational Biology/methods

MGnify: the microbiome sequence data analysis resource in 2023.

Richardson, Lorna; Allen, Ben; Baldi, Germana; Beracochea, Martin; Bileschi, Maxwell L; Burdett, Tony; Burgin, Josephine; Caballero-Pérez, Juan; Cochrane, Guy; Colwell, Lucy J; Curtis, Tom; Escobar-Zepeda, Alejandra; Gurbich, Tatiana A; Kale, Varsha; Korobeynikov, Anton; Raj, Shriya; Rogers, Alexander B; Sakharova, Ekaterina; Sanchez, Santiago; Wilkinson, Darren J; Finn, Robert D.

Nucleic Acids Res ; 51(D1): D753-D759, 2023 01 06.

Article in English | MEDLINE | ID: mdl-36477304

ABSTRACT

The MGnify platform (https://www.ebi.ac.uk/metagenomics) facilitates the assembly, analysis and archiving of microbiome-derived nucleic acid sequences. The platform provides access to taxonomic assignments and functional annotations for nearly half a million analyses covering metabarcoding, metatranscriptomic, and metagenomic datasets, which are derived from a wide range of different environments. Over the past 3 years, MGnify has not only grown in terms of the number of datasets contained but also increased the breadth of analyses provided, such as the analysis of long-read sequences. The MGnify protein database now exceeds 2.4 billion non-redundant sequences predicted from metagenomic assemblies. This collection is now organised into a relational database making it possible to understand the genomic context of the protein through navigation back to the source assembly and sample metadata, marking a major improvement. To extend beyond the functional annotations already provided in MGnify, we have applied deep learning-based annotation methods. The technology underlying MGnify's Application Programming Interface (API) and website has been upgraded, and we have enabled the ability to perform downstream analysis of the MGnify data through the introduction of a coupled Jupyter Lab environment.

Subject(s)

Microbiota , Sequence Analysis , Genomics/methods , Metagenome , Metagenomics/methods , Microbiota/genetics , Software , Sequence Analysis/methods

InterPro in 2022.

Paysan-Lafosse, Typhaine; Blum, Matthias; Chuguransky, Sara; Grego, Tiago; Pinto, Beatriz Lázaro; Salazar, Gustavo A; Bileschi, Maxwell L; Bork, Peer; Bridge, Alan; Colwell, Lucy; Gough, Julian; Haft, Daniel H; Letunic, Ivica; Marchler-Bauer, Aron; Mi, Huaiyu; Natale, Darren A; Orengo, Christine A; Pandurangan, Arun P; Rivoire, Catherine; Sigrist, Christian J A; Sillitoe, Ian; Thanki, Narmada; Thomas, Paul D; Tosatto, Silvio C E; Wu, Cathy H; Bateman, Alex.

Nucleic Acids Res ; 51(D1): D418-D427, 2023 01 06.

Article in English | MEDLINE | ID: mdl-36350672

ABSTRACT

The InterPro database (https://www.ebi.ac.uk/interpro/) provides an integrative classification of protein sequences into families, and identifies functionally important domains and conserved sites. Here, we report recent developments with InterPro (version 90.0) and its associated software, including updates to data content and to the website. These developments extend and enrich the information provided by InterPro, and provide a more user friendly access to the data. Additionally, we have worked on adding Pfam website features to the InterPro website, as the Pfam website will be retired in late 2022. We also show that InterPro's sequence coverage has kept pace with the growth of UniProtKB. Moreover, we report the development of a card game as a method of engaging the non-scientific community. Finally, we discuss the benefits and challenges brought by the use of artificial intelligence for protein structure prediction.

Subject(s)

Databases, Protein , Humans , Amino Acid Sequence , Artificial Intelligence , Internet , Proteins/chemistry , Software

Using deep learning to annotate the protein universe.

Bileschi, Maxwell L; Belanger, David; Bryant, Drew H; Sanderson, Theo; Carter, Brandon; Sculley, D; Bateman, Alex; DePristo, Mark A; Colwell, Lucy J.

Nat Biotechnol ; 40(6): 932-937, 2022 06.

Article in English | MEDLINE | ID: mdl-35190689

ABSTRACT

Understanding the relationship between amino acid sequence and protein function is a long-standing challenge with far-reaching scientific and translational implications. State-of-the-art alignment-based techniques cannot predict function for one-third of microbial protein sequences, hampering our ability to exploit data from diverse organisms. Here, we train deep learning models to accurately predict functional annotations for unaligned amino acid sequences across rigorous benchmark assessments built from the 17,929 families of the protein families database Pfam. The models infer known patterns of evolutionary substitutions and learn representations that accurately cluster sequences from unseen families. Combining deep models with existing methods significantly improves remote homology detection, suggesting that the deep models learn complementary information. This approach extends the coverage of Pfam by >9.5%, exceeding additions made over the last decade, and predicts function for 360 human reference proteome proteins with no previous Pfam annotation. These results suggest that deep learning models will be a core component of future protein annotation tools.

Subject(s)

Deep Learning , Amino Acid Sequence , Databases, Protein , Humans , Molecular Sequence Annotation , Proteome/metabolism , Proteomics

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL