Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 44
Filter
1.
Sci Adv ; 10(21): eadj6823, 2024 May 24.
Article in English | MEDLINE | ID: mdl-38781323

ABSTRACT

We present a draft genome of the little bush moa (Anomalopteryx didiformis)-one of approximately nine species of extinct flightless birds from Aotearoa, New Zealand-using ancient DNA recovered from a fossil bone from the South Island. We recover a complete mitochondrial genome at 249.9× depth of coverage and almost 900 megabases of a male moa nuclear genome at ~4 to 5× coverage, with sequence contiguity sufficient to identify more than 85% of avian universal single-copy orthologs. We describe a diverse landscape of transposable elements and satellite repeats, estimate a long-term effective population size of ~240,000, identify a diverse suite of olfactory receptor genes and an opsin repertoire with sensitivity in the ultraviolet range, show that the wingless moa phenotype is likely not attributable to gene loss or pseudogenization, and identify potential function-altering coding sequence variants in moa that could be synthesized for future functional assays. This genomic resource should support further studies of avian evolution and morphological divergence.


Subject(s)
Birds , Extinction, Biological , Genome , Animals , Birds/genetics , Cell Nucleus/genetics , Phylogeny , Fossils , Genome, Mitochondrial , Flight, Animal , New Zealand , Male , DNA Transposable Elements/genetics , Genomics/methods
2.
Nat Cell Biol ; 25(6): 865-876, 2023 06.
Article in English | MEDLINE | ID: mdl-37169880

ABSTRACT

The elucidation of the mechanisms of ageing and the identification of methods to control it have long been anticipated. Recently, two factors associated with ageing-the accumulation of senescent cells and the change in the composition of gut microbiota-have been shown to play key roles in ageing. However, little is known about how these phenomena occur and are related during ageing. Here we show that the persistent presence of commensal bacteria gradually induces cellular senescence in gut germinal centre B cells. Importantly, this reduces both the production and diversity of immunoglobulin A (IgA) antibodies that target gut bacteria, thereby changing the composition of gut microbiota in aged mice. These results have revealed the existence of IgA-mediated crosstalk between the gut microbiota and cellular senescence and thus extend our understanding of the mechanism of gut microbiota changes with age, opening up possibilities for their control.


Subject(s)
Gastrointestinal Microbiome , Animals , Mice , Bacteria , Immunoglobulin A , Cellular Senescence , B-Lymphocytes
3.
Biophys Rev ; 14(6): 1247-1253, 2022 Dec.
Article in English | MEDLINE | ID: mdl-36536641

ABSTRACT

Structural genomics began as a global effort in the 1990s to determine the tertiary structures of all protein families as a response to large-scale genome sequencing projects. The immediate outcome was an influx of tens of thousands of protein structures, many of which had unknown functions. At the time, the value of structural genomics was controversial. However, the structures themselves were only the most obvious output. In addition, these newly solved structures motivated the emergence of huge data science and infrastructure efforts, which, together with advances in Deep Learning, have brought about a revolution in computational molecular biology. Here, we review some of the computational research carried out at the Protein Data Bank Japan (PDBj) during the Protein 3000 project under the leadership of Haruki Nakamura, much of which continues to flourish today.

4.
Sci Transl Med ; 14(650): eabn7737, 2022 06 22.
Article in English | MEDLINE | ID: mdl-35471044

ABSTRACT

The Omicron (B.1.1.529) SARS-CoV-2 variant contains an unusually high number of mutations in the spike protein, raising concerns of escape from vaccines, convalescent serum, and therapeutic drugs. Here, we analyzed the degree to which Omicron pseudo-virus evades neutralization by serum or therapeutic antibodies. Serum samples obtained 3 months after two doses of BNT162b2 vaccination exhibited 18-fold lower neutralization titers against Omicron than parental virus. Convalescent serum samples from individuals infected with the Alpha and Delta variants allowed similar frequencies of Omicron breakthrough infections. Domain-wise analysis using chimeric spike proteins revealed that this efficient evasion was primarily achieved by mutations clustered in the receptor binding domain but that multiple mutations in the N-terminal domain contributed as well. Omicron escaped a therapeutic cocktail of imdevimab and casirivimab, whereas sotrovimab, which targets a conserved region to avoid viral mutation, remains effective. Angiotensin-converting enzyme 2 (ACE2) decoys are another virus-neutralizing drug modality that are free, at least in theory, from complete escape. Deep mutational analysis demonstrated that an engineered ACE2 molecule prevented escape for each single-residue mutation in the receptor binding domain, similar to immunized serum. Engineered ACE2 neutralized Omicron comparably to the Wuhan strain and also showed a therapeutic effect against Omicron infection in hamsters and human ACE2 transgenic mice. Similar to previous SARS-CoV-2 variants, some sarbecoviruses showed high sensitivity against engineered ACE2, confirming the therapeutic value against diverse variants, including those that are yet to emerge.


Subject(s)
Angiotensin-Converting Enzyme 2 , COVID-19 , Animals , Antibodies, Monoclonal, Humanized , Antibodies, Neutralizing/therapeutic use , Antibodies, Viral/therapeutic use , BNT162 Vaccine , COVID-19/therapy , Humans , Immunization, Passive , Mice , Peptidyl-Dipeptidase A/chemistry , Peptidyl-Dipeptidase A/genetics , Peptidyl-Dipeptidase A/metabolism , SARS-CoV-2 , COVID-19 Serotherapy
5.
J Mol Evol ; 90(1): 73-94, 2022 02.
Article in English | MEDLINE | ID: mdl-35084522

ABSTRACT

Extant organisms commonly use 20 amino acids in protein synthesis. In the translation system, aminoacyl-tRNA synthetase (ARS) selectively binds an amino acid and transfers it to the cognate tRNA. It is postulated that the amino acid repertoire of ARS expanded during the development of the translation system. In this study we generated composite phylogenetic trees for seven ARSs (SerRS, ProRS, ThrRS, GlyRS-1, HisRS, AspRS, and LysRS) which are thought to have diverged by gene duplication followed by mutation, before the evolution of the last universal common ancestor. The composite phylogenetic tree shows that the AspRS/LysRS branch diverged from the other five ARSs at the deepest node, with the GlyRS/HisRS branch and the other three ARSs (ThrRS, ProRS and SerRS) diverging at the second deepest node. ThrRS diverged next, and finally ProRS and SerRS diverged from each other. Based on the phylogenetic tree, sequences of the ancestral ARSs prior to the evolution of the last universal common ancestor were predicted. The amino acid specificity of each ancestral ARS was then postulated by comparison with amino acid recognition sites of ARSs of extant organisms. Our predictions demonstrate that ancestral ARSs had substantial specificity and that the number of amino acid types amino-acylated by proteinaceous ARSs was limited before the appearance of a fuller range of proteinaceous ARS species. From an assumption that 10 amino acid species are required for folding and function, proteinaceous ARS possibly evolved in a translation system composed of preexisting ribozyme ARSs, before the evolution of the last universal common ancestor.


Subject(s)
Amino Acyl-tRNA Synthetases , Amino Acids/genetics , Amino Acyl-tRNA Synthetases/genetics , Amino Acyl-tRNA Synthetases/metabolism , Phylogeny , RNA, Transfer/metabolism
6.
BMC Biol ; 19(1): 217, 2021 09 29.
Article in English | MEDLINE | ID: mdl-34587965

ABSTRACT

BACKGROUND: DNA barcodes are a useful tool for discovering, understanding, and monitoring biodiversity which are critical tasks at a time of rapid biodiversity loss. However, widespread adoption of barcodes requires cost-effective and simple barcoding methods. We here present a workflow that satisfies these conditions. It was developed via "innovation through subtraction" and thus requires minimal lab equipment, can be learned within days, reduces the barcode sequencing cost to < 10 cents, and allows fast turnaround from specimen to sequence by using the portable MinION sequencer. RESULTS: We describe how tagged amplicons can be obtained and sequenced with the real-time MinION sequencer in many settings (field stations, biodiversity labs, citizen science labs, schools). We also provide amplicon coverage recommendations that are based on several runs of the latest generation of MinION flow cells ("R10.3") which suggest that each run can generate barcodes for > 10,000 specimens. Next, we present a novel software, ONTbarcoder, which overcomes the bioinformatics challenges posed by MinION reads. The software is compatible with Windows 10, Macintosh, and Linux, has a graphical user interface (GUI), and can generate thousands of barcodes on a standard laptop within hours based on only two input files (FASTQ, demultiplexing file). We document that MinION barcodes are virtually identical to Sanger and Illumina barcodes for the same specimens (> 99.99%) and provide evidence that MinION flow cells and reads have improved rapidly since 2018. CONCLUSIONS: We propose that barcoding with MinION is the way forward for government agencies, universities, museums, and schools because it combines low consumable and capital cost with scalability. Small projects can use the flow cell dongle ("Flongle") while large projects can rely on MinION flow cells that can be stopped and re-used after collecting sufficient data for a given project.


Subject(s)
Biodiversity , Computational Biology , DNA Barcoding, Taxonomic , High-Throughput Nucleotide Sequencing , Sequence Analysis, DNA , Software
7.
Commun Biol ; 4(1): 1134, 2021 09 22.
Article in English | MEDLINE | ID: mdl-34552191

ABSTRACT

The ability to predict emerging variants of SARS-CoV-2 would be of enormous value, as it would enable proactive design of vaccines in advance of such emergence. We estimated diversity of each site on a multiple sequence alignment (MSA) of the Spike (S) proteins from close relatives of SARS-CoV-2 that infected bat and pangolin before the pandemic. Then we compared the locations of high diversity sites in this MSA and those of mutations found in multiple emerging lineages of human-infecting SARS-CoV-2. This comparison revealed a significant correspondence, which suggests that a limited number of sites in this protein are repeatedly substituted in different lineages of this group of viruses. It follows, therefore, that the sites of future emerging mutations in SARS-CoV-2 can be predicted by analyzing their relatives (outgroups) that have infected non-human hosts. We discuss a possible evolutionary basis for these substitutions and provide a list of frequently substituted sites that potentially include future emerging variants in SARS-CoV-2.


Subject(s)
Evolution, Molecular , SARS-CoV-2/genetics , Animals , Genome, Viral/genetics , Sequence Alignment
9.
Heliyon ; 7(2): e06317, 2021 Feb.
Article in English | MEDLINE | ID: mdl-33665461

ABSTRACT

The oomycete genus Phytophthora includes devastating plant pathogens that are found in almost all ecosystems. We sequenced the genomes of two quarantined Phytophthora species-P. fragariae and P. rubi. Comparing these Phytophthora species and related genera allowed reconstruction of the phylogenetic relationships within the genus Phytophthora and revealed Phytophthora genomic features associated with infection and pathogenicity. We found that several hundred Phytophthora genes are putatively inherited from red algae, but Phytophthora does not have vestigial plastids originating from phototrophs. The horizontally-transferred Phytophthora genes are abundant transposons that "transmit" exogenous gene to Phytophthora species thus bring about the gene recombination possibility. Several expansion events of Phytophthora gene families associated with cell wall biogenesis can be used as mutational targets to elucidate gene function in pathogenic interactions with host plants. This work enhanced the understanding of Phytophthora evolution and will also be helpful for the design of phytopathological control strategies.

10.
Methods Mol Biol ; 2231: 135-145, 2021.
Article in English | MEDLINE | ID: mdl-33289891

ABSTRACT

Long DNA and RNA reads from nanopore and PacBio technologies have many applications, but the raw reads have a substantial error rate. More accurate sequences can be obtained by merging multiple reads from overlapping parts of the same sequence. lamassemble aligns up to ∼1000 reads to each other, and makes a consensus sequence, which is often much more accurate than the raw reads. It is useful for studying a region of interest such as an expanded tandem repeat or other disease-causing mutation.


Subject(s)
Consensus Sequence , Genomics/methods , Sequence Alignment/methods , Sequence Analysis, DNA/methods , Software , Animals , Genetic Techniques , High-Throughput Nucleotide Sequencing , Humans , Nanopores
11.
Methods Mol Biol ; 2231: 163-177, 2021.
Article in English | MEDLINE | ID: mdl-33289893

ABSTRACT

The Database of Aligned Structural Homologs (DASH) is a tool for efficiently navigating the Protein Data Bank (PDB) by means of pre-computed pairwise structural alignments. We recently showed that, by integrating DASH structural alignments with the multiple sequence alignment (MSA) software MAFFT, we were able to significantly improve MSA accuracy without dramatically increasing manual or computational complexity. In the latest DASH update, such queries are not limited to PDB entries but can also be launched from user-provided protein coordinates. Here, we describe a further extension of DASH that retrieves intermolecular interactions of all structurally similar domains in the PDB to a query domain of interest. We illustrate these new features using a model of the NYN domain of the ribonuclease N4BP1 as an example. We show that the protein-nucleotide interactions returned are distributed on the surface of the NYN domain in an asymmetric manner, roughly centered on the known nuclease active site.


Subject(s)
RNA-Binding Proteins/chemistry , Sequence Alignment/methods , Sequence Analysis, Protein/methods , Software , Algorithms , Amino Acid Sequence , Computational Biology , Databases, Protein , Nuclear Proteins/chemistry , Protein Binding , Protein Domains , Ribonucleases/chemistry
12.
Front Microbiol ; 11: 2112, 2020.
Article in English | MEDLINE | ID: mdl-33042039

ABSTRACT

The SARS-CoV-2 S protein is a major point of interaction between the virus and the human immune system. As a consequence, the S protein is not a static target but undergoes rapid molecular evolution. In order to more fully understand the selection pressure during evolution, we examined residue positions in the S protein that vary greatly across closely related viruses but are conserved in the subset of viruses that infect humans. These "evolutionarily important" residues were not distributed evenly across the S protein but were concentrated in two domains: the N-terminal domain and the receptor-binding domain, both of which play a role in host cell binding in a number of related viruses. In addition to being localized in these two domains, evolutionary importance correlated with structural flexibility and inversely correlated with distance from known or predicted host receptor-binding residues. Finally, we observed a bias in the composition of the amino acids that make up such residues toward more human-like, rather than virus-like, sequence motifs.

13.
Genome Med ; 12(1): 67, 2020 07 31.
Article in English | MEDLINE | ID: mdl-32731881

ABSTRACT

BACKGROUND: Many genetic/genomic disorders are caused by genomic rearrangements. Standard methods can often characterize these variations only partly, e.g., copy number changes or breakpoints. It is important to fully understand the order and orientation of rearranged fragments, with precise breakpoints, to know the pathogenicity of the rearrangements. METHODS: We performed whole-genome-coverage nanopore sequencing of long DNA reads from four patients with chromosomal translocations. We identified rearrangements relative to a reference human genome, subtracted rearrangements shared by any of 33 control individuals, and determined the order and orientation of rearranged fragments, with our newly developed analysis pipeline. RESULTS: We describe the full characterization of complex chromosomal rearrangements, by filtering out genomic rearrangements seen in controls without the same disease, reducing the number of loci per patient from a few thousand to a few dozen. Breakpoint detection was very accurate; we usually see ~ 0 ± 1 base difference from Sanger sequencing-confirmed breakpoints. For one patient with two reciprocal chromosomal translocations, we find that the translocation points have complex rearrangements of multiple DNA fragments involving 5 chromosomes, which we could order and orient by an automatic algorithm, thereby fully reconstructing the rearrangement. A rearrangement is more than the sum of its parts: some properties, such as sequence loss, can be inferred only after reconstructing the whole rearrangement. In this patient, the rearrangements were evidently caused by shattering of the chromosomes into multiple fragments, which rejoined in a different order and orientation with loss of some fragments. CONCLUSIONS: We developed an effective analytic pipeline to find chromosomal aberration in congenital diseases by filtering benign changes, only from long read sequencing. Our algorithm for reconstruction of complex rearrangements is useful to interpret rearrangements with many breakpoints, e.g., chromothripsis. Our approach promises to fully characterize many congenital germline rearrangements, provided they do not involve poorly understood loci such as centromeric repeats.


Subject(s)
Gene Rearrangement , Genome-Wide Association Study , Germ-Line Mutation , Chromosome Aberrations , Chromosome Breakpoints , Genetic Association Studies/methods , Genetic Predisposition to Disease , Genome, Human , Genomics/methods , High-Throughput Nucleotide Sequencing , Humans , Translocation, Genetic , Whole Genome Sequencing
14.
J Hum Genet ; 65(8): 667-674, 2020 Aug.
Article in English | MEDLINE | ID: mdl-32296131

ABSTRACT

Chromothripsis is a type of chaotic complex genomic rearrangement caused by a single event of chromosomal shattering and repair processes. Chromothripsis is known to cause rare congenital diseases when it occurs in germline cells, however, current genome analysis technologies have difficulty in detecting and deciphering chromothripsis. It is possible that this type of complex rearrangement may be overlooked in rare-disease patients whose genetic diagnosis is unsolved. We applied long read nanopore sequencing and our recently developed analysis pipeline dnarrange to a patient who has a reciprocal chromosomal translocation t(8;18)(q22;q21) as a result of chromothripsis between the two chromosomes, and fully characterize the complex rearrangements at the translocation site. The patient genome was evidently shattered into 19 fragments, and rejoined into derivative chromosomes in a random order and orientation. The reconstructed patient genome indicates loss of five genomic regions, which all overlap with microarray-detected copy number losses. We found that two disease-related genes RAD21 and EXT1 were lost by chromothripsis. These two genes could fully explain the disease phenotype with facial dysmorphisms and bone abnormality, which is likely a contiguous gene syndrome, Cornelia de Lange syndrome type IV (CdLs-4) and atypical Langer-Giedion syndrome (LGS), also known as trichorhinophalangeal syndrome type II (TRPSII). This provides evidence that our approach based on long read sequencing can fully characterize chromothripsis in a patient's genome, which is important for understanding the phenotype of disease caused by complex genomic rearrangement.


Subject(s)
Cell Cycle Proteins/genetics , Chromothripsis , DNA-Binding Proteins/genetics , De Lange Syndrome/genetics , Langer-Giedion Syndrome/genetics , N-Acetylglucosaminyltransferases/genetics , Child , Chromosome Deletion , De Lange Syndrome/diagnosis , De Lange Syndrome/physiopathology , Genome , Humans , Langer-Giedion Syndrome/diagnosis , Langer-Giedion Syndrome/physiopathology , Male , Nanopore Sequencing , Phenotype , Sequence Analysis, DNA , Translocation, Genetic
15.
J Hum Genet ; 65(5): 475-480, 2020 May.
Article in English | MEDLINE | ID: mdl-32066831

ABSTRACT

Recently, a recessively inherited intronic repeat expansion in replication factor C1 (RFC1) was identified in cerebellar ataxia with neuropathy and bilateral vestibular areflexia syndrome (CANVAS). Here, we describe a Japanese case of genetically confirmed CANVAS with autonomic failure and auditory hallucination. The case showed impaired uptake of iodine-123-metaiodobenzylguanidine and 123I-ioflupane in the cardiac sympathetic nerve and dopaminergic neurons, respectively, by single-photon emission computed tomography. Long-read sequencing identified biallelic pathogenic (AAGGG)n nucleotide repeat expansion in RFC1 and heterozygous benign (TAAAA)n and (TAGAA)n expansions in brain expressed, associated with NEDD4 (BEAN1). Enrichment of the repeat regions in RFC1 and BEAN1 using a Cas9-mediated system clearly distinguished between pathogenic and benign repeat expansions. The haplotype around RFC1 indicated that the (AAGGG)n expansion in our case was on the same ancestral allele as that of European cases. Thus, long-read sequencing facilitates precise genetic diagnosis of diseases with complex repeat structures and various expansions.


Subject(s)
Bilateral Vestibulopathy/genetics , Cerebellar Ataxia/genetics , DNA Repeat Expansion , Replication Protein C/genetics , Sequence Analysis, DNA , Aged, 80 and over , Asian People , Bilateral Vestibulopathy/diagnosis , Cerebellar Ataxia/diagnosis , Female , Humans , Japan , Nedd4 Ubiquitin Protein Ligases/genetics
16.
Methods Mol Biol ; 2048: 207-229, 2019.
Article in English | MEDLINE | ID: mdl-31396940

ABSTRACT

Structural modeling plays a key role in protein function prediction on a genome-wide scale. For B and T lymphocyte receptors, the critical functional question is: which antigens and epitopes are targeted? With emerging B cell receptor (BCR) and T cell receptor (TCR) sequencing methods improving in both breadth and depth, there is a growing need for methods that can help answer this question. Since lymphocyte-antigen recognition depends on complementarity, structural modeling is likely to play an important role in understanding antigen specificity and affinity. In the case of BCRs, such modeling methods have a long history in the study and design of antibodies. However, for TCRs there are relatively few publicly available modeling tools, and, to our knowledge, none that incorporate interaction between TCRs and peptide-MHC (pMHC) complexes. Here, we provide a web-based tool, ImmuneScape ( https://sysimm.org/immune-scape/ ), to carry out TCR-pMHC modeling as a first step toward structure-based function prediction.


Subject(s)
HLA Antigens/metabolism , Molecular Docking Simulation , Molecular Dynamics Simulation , Receptors, Antigen, T-Cell/metabolism , T-Lymphocytes/metabolism , Alleles , Epitope Mapping/methods , Epitopes, T-Lymphocyte/genetics , Epitopes, T-Lymphocyte/immunology , Epitopes, T-Lymphocyte/metabolism , HLA Antigens/genetics , HLA Antigens/immunology , Humans , Receptors, Antigen, T-Cell/genetics , Receptors, Antigen, T-Cell/immunology , Sequence Alignment , Software , Structure-Activity Relationship , T-Lymphocytes/immunology
17.
Nat Genet ; 51(8): 1215-1221, 2019 08.
Article in English | MEDLINE | ID: mdl-31332381

ABSTRACT

Neuronal intranuclear inclusion disease (NIID) is a progressive neurodegenerative disease that is characterized by eosinophilic hyaline intranuclear inclusions in neuronal and somatic cells. The wide range of clinical manifestations in NIID makes ante-mortem diagnosis difficult1-8, but skin biopsy enables its ante-mortem diagnosis9-12. The average onset age is 59.7 years among approximately 140 NIID cases consisting of mostly sporadic and several familial cases. By linkage mapping of a large NIID family with several affected members (Family 1), we identified a 58.1 Mb linked region at 1p22.1-q21.3 with a maximum logarithm of the odds score of 4.21. By long-read sequencing, we identified a GGC repeat expansion in the 5' region of NOTCH2NLC (Notch 2 N-terminal like C) in all affected family members. Furthermore, we found similar expansions in 8 unrelated families with NIID and 40 sporadic NIID cases. We observed abnormal anti-sense transcripts in fibroblasts specifically from patients but not unaffected individuals. This work shows that repeat expansion in human-specific NOTCH2NLC, a gene that evolved by segmental duplication, causes a human disease.


Subject(s)
Brain/pathology , High-Throughput Nucleotide Sequencing/methods , Linkage Disequilibrium , Neurodegenerative Diseases/genetics , Neurodegenerative Diseases/pathology , Receptors, Notch/genetics , Trinucleotide Repeat Expansion/genetics , Adolescent , Adult , Aged , Brain/metabolism , Case-Control Studies , Female , Genetic Markers/genetics , Humans , Intranuclear Inclusion Bodies/genetics , Intranuclear Inclusion Bodies/pathology , Male , Middle Aged , Pedigree , Receptors, Notch/metabolism , Young Adult
18.
Nucleic Acids Res ; 47(W1): W5-W10, 2019 07 02.
Article in English | MEDLINE | ID: mdl-31062021

ABSTRACT

Here, we describe a web server that integrates structural alignments with the MAFFT multiple sequence alignment (MSA) tool. For this purpose, we have prepared a web-based Database of Aligned Structural Homologs (DASH), which provides structural alignments at the domain and chain levels for all proteins in the Protein Data Bank (PDB), and can be queried interactively or by a simple REST-like API. MAFFT-DASH integration can be invoked with a single flag on either the web (https://mafft.cbrc.jp/alignment/server/) or command-line versions of MAFFT. In our benchmarks using 878 cases from the BAliBase, HomFam, OXFam, Mattbench and SISYPHUS datasets, MAFFT-DASH showed 10-20% improvement over standard MAFFT for MSA problems with weak similarity, in terms of Sum-of-Pairs (SP), a measure of how well a program succeeds at aligning input sequences in comparison to a reference alignment. When MAFFT alignments were supplemented with homologous sequences, further improvement was observed. Potential applications of DASH beyond MSA enrichment include functional annotation through detection of remote homology and assembly of template libraries for homology modeling.


Subject(s)
Amino Acid Sequence/genetics , Proteins/genetics , Sequence Alignment/methods , Software , Algorithms , Databases, Protein , Humans , Sequence Analysis, Protein/methods , Sequence Analysis, RNA , Sequence Homology
19.
Brief Bioinform ; 20(4): 1160-1166, 2019 07 19.
Article in English | MEDLINE | ID: mdl-28968734

ABSTRACT

This article describes several features in the MAFFT online service for multiple sequence alignment (MSA). As a result of recent advances in sequencing technologies, huge numbers of biological sequences are available and the need for MSAs with large numbers of sequences is increasing. To extract biologically relevant information from such data, sophistication of algorithms is necessary but not sufficient. Intuitive and interactive tools for experimental biologists to semiautomatically handle large data are becoming important. We are working on development of MAFFT toward these two directions. Here, we explain (i) the Web interface for recently developed options for large data and (ii) interactive usage to refine sequence data sets and MSAs.


Subject(s)
Sequence Alignment/methods , Software , Algorithms , Computational Biology/methods , Databases, Genetic , Internet , Sequence Alignment/statistics & numerical data , Sequence Analysis , User-Computer Interface
20.
Bioinformatics ; 34(14): 2490-2492, 2018 07 15.
Article in English | MEDLINE | ID: mdl-29506019

ABSTRACT

Summary: We report an update for the MAFFT multiple sequence alignment program to enable parallel calculation of large numbers of sequences. The G-INS-1 option of MAFFT was recently reported to have higher accuracy than other methods for large data, but this method has been impractical for most large-scale analyses, due to the requirement of large computational resources. We introduce a scalable variant, G-large-INS-1, which has equivalent accuracy to G-INS-1 and is applicable to 50 000 or more sequences. Availability and implementation: This feature is available in MAFFT versions 7.355 or later at https://mafft.cbrc.jp/alignment/software/mpi.html. Supplementary information: Supplementary data are available at Bioinformatics online.


Subject(s)
Computational Biology/methods , Sequence Alignment/methods , Software , Algorithms , Protein Structure, Secondary , Sequence Analysis, Protein/methods , Sequence Analysis, RNA/methods
SELECTION OF CITATIONS
SEARCH DETAIL
...