Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 14 de 14
Filter
Add more filters










Publication year range
1.
Wellcome Open Res ; 8: 144, 2023.
Article in English | MEDLINE | ID: mdl-38026731

ABSTRACT

We present a genome assembly from an individual male Incurvaria masculella (the Feathered Bright; Arthropoda; Insecta; Lepidoptera; Incurvariidae). The genome sequence is 552 megabases in span. Most of the assembly is scaffolded into 26 chromosomal pseudomolecules, including the assembled Z sex chromosome. The mitochondrial genome has also been assembled and is 15.3 kilobases in length.

2.
Wellcome Open Res ; 8: 103, 2023.
Article in English | MEDLINE | ID: mdl-37799508

ABSTRACT

We present a genome assembly from an individual male Stenoptilia bipunctidactyla (the Twin-spot Plume; Arthropoda; Insecta; Lepidoptera; Pterophoridae). The genome sequence is 822.9 megabases in span. Most of the assembly is scaffolded into 30 chromosomal pseudomolecules, including the Z sex chromosome. The mitochondrial genome has also been assembled and is 17.8 kilobases in length. Gene annotation of this assembly on Ensembl has identified 22,137 protein coding genes.

3.
Wellcome Open Res ; 8: 32, 2023.
Article in English | MEDLINE | ID: mdl-37822564

ABSTRACT

We present a genome assembly from an individual Ypsolopha sequella (the Pied Smudge; Arthropoda; Insecta; Lepidoptera; Ypsolophidae). The genome sequence is 867 megabases in span. Most of the assembly is scaffolded into 30 chromosomal pseudomolecules with the Z sex chromosome assembled. The mitochondrial genome has also been assembled and is 15.3 kilobases in length. Gene annotation of this assembly on Ensembl identified 20,394 protein coding genes.

4.
Wellcome Open Res ; 8: 109, 2023.
Article in English | MEDLINE | ID: mdl-37840882

ABSTRACT

We present a genome assembly from an individual male Sesia bembeciformis (the Lunar Hornet; Arthropoda; Insecta; Lepidoptera; Sesiidae). The genome sequence is 477.1 megabases in span. Most of the assembly is scaffolded into 31 chromosomal pseudomolecules, including the Z sex chromosome. The mitochondrial genome has also been assembled and is 16.1 kilobases in length. Gene annotation of this assembly on Ensembl has identified 15,843 protein coding genes.

5.
Artif Life ; 28(2): 173-204, 2022 06 28.
Article in English | MEDLINE | ID: mdl-35727997

ABSTRACT

We evolve floating point Sextic polynomial populations of genetic programming binary trees for up to a million generations. We observe continued innovation but this is limited by tree depth. We suggest that deep expressions are resilient to learning as they disperse information, impeding evolvability, and the adaptation of highly nested organisms, and we argue instead for open complexity. Programs with more than 2,000,000,000 instructions (depth 20,000) are created by crossover. To support unbounded long-term evolution experiments in genetic programming (GP), we use incremental fitness evaluation and both SIMD parallel AVX 512-bit instructions and 16 threads to yield performance equivalent to 1.1 trillion GP operations per second, 1.1 tera GPops, on an Intel Xeon Gold 6136 CPU 3.00GHz server.


Subject(s)
Algorithms , Software , Biological Evolution
6.
BioData Min ; 7: 3, 2014.
Article in English | MEDLINE | ID: mdl-24872843

ABSTRACT

BACKGROUND: In silco Biology is increasingly important and is often based on public data. While the problem of contamination is well recognised in microbiology labs the corresponding problem of database corruption has received less attention. RESULTS: Mapping 50 billion next generation DNA sequences from The Thousand Genome Project against published genomes reveals many that match one or more Mycoplasma but are not included in the reference human genome GRCh37.p5. Many of these are of low quality but NCBI BLAST searches confirm some high quality, high entropy sequences match Mycoplasma but no human sequences. CONCLUSIONS: It appears at least 7% of 1000G samples are contaminated.

7.
J Integr Bioinform ; 7(3)2010 Mar 25.
Article in English | MEDLINE | ID: mdl-20375459

ABSTRACT

BACKGROUND: A chimeric transcript is a single RNA sequence which results from the transcription of two adjacent genes. Recent studies estimate that at least 4% of tandem human gene pairs may form chimeric transcripts. Affymetrix GeneChip data are used to study the expression patterns of tens of thousands of genes and the probe sequences used in these microarrays can potentially map to exotic RNA sequences such as chimeras. RESULTS: We have studied human chimeras and investigated their expression patterns using large surveys of Affymetrix microarray data obtained from the Gene Expression Omnibus. We show that for six probe sets, a unique probe mapping to a transcript produced by one of the adjacent genes can be used to identify the expression patterns of readthrough transcripts. Furthermore, unique probes mapping to an intergenic exon present only in the MASK-BP3 chimera can be used directly to study the expression levels of this transcript. CONCLUSIONS: We have attempted to implement a new method for identifying tandem chimerism. In this analysis unambiguous probes are needed to measure run-off transcription and probes that map to intergenic exons are particularly valuable for identifying the expression of chimeras.


Subject(s)
Data Collection , Gene Expression Profiling , Oligonucleotide Array Sequence Analysis , RNA, Messenger/genetics , DNA Probes/metabolism , Exons/genetics , Humans , Transcription, Genetic
8.
Brief Bioinform ; 10(3): 259-77, 2009 May.
Article in English | MEDLINE | ID: mdl-19359259

ABSTRACT

The reliable interpretation of Affymetrix GeneChip data is a multi-faceted problem. The interplay between biophysics, bioinformatics and mining of GeneChip surveys is leading to new insights into how best to analyse the data. Many of the molecular processes occurring on the surfaces of GeneChips result from the high surface density of probes. Interactions between neighbouring adjacent probes affect their rate and strength of hybridization to targets. Competing targets may hybridize to the same probe, and targets may partially bind to more than one probe. The formation of these partial hybrids results in a number of probes not reaching thermodynamic equilibrium during hybridization. Moreover, some targets fold up, or cross-hybridize to other targets. Furthermore, probes may fold and can undergo chemical saturation. There are also sequence-dependent differences in the rates of target desorption during the washing stage. Improvements in the mappings between probe sequence and biological databases are leading to more accurate gene expression profiles. Moreover, algorithms that combine the intensities of multiple probes into single measures of expression are increasingly dependent upon models of the hybridization processes occurring on GeneChips. The large repositories of GeneChip data can be searched for systematic effects across many experiments. This data mining has led to the discovery of a family of thousands of probes, which show correlated expression across thousands of GeneChip experiments. These probes contain runs of guanines, suggesting that G-quadruplexes are able to form on GeneChips. We discuss the impact of these structures on the interpretation of data from GeneChip experiments.


Subject(s)
Base Sequence , Computational Biology , DNA Probes , Guanine , Oligonucleotide Array Sequence Analysis , Algorithms , Animals , Biophysics , Databases, Genetic , Gene Expression Profiling/methods , Humans , Molecular Sequence Data , Molecular Structure , Nucleic Acid Conformation , Nucleic Acid Hybridization , Oligonucleotide Array Sequence Analysis/instrumentation , Oligonucleotide Array Sequence Analysis/methods
10.
BMC Genomics ; 9: 613, 2008 Dec 18.
Article in English | MEDLINE | ID: mdl-19094220

ABSTRACT

BACKGROUND: High Density Oligonucleotide arrays (HDONAs), such as the Affymetrix HG-U133A GeneChip, use sets of probes chosen to match specified genes, with the expectation that if a particular gene is highly expressed then all the probes in that gene's probe set will provide a consistent message signifying the gene's presence. However, probes that contain a G-spot (a sequence of four or more guanines) behave abnormally and it has been suggested that these probes are responding to some biochemical effect such as the formation of G-quadruplexes. RESULTS: We have tested this expectation by examining the correlation coefficients between pairs of probes using the data on thousands of arrays that are available in the NCBI Gene Expression Omnibus (GEO) repository. We confirm the finding that G-spot probes are poorly correlated with others in their probesets and reveal that, by contrast, they are highly correlated with one another. We demonstrate that the correlation is most marked when the G-spot is at the 5' end of the probe. CONCLUSION: Since these G-spot probes generally show little correlation with the other members of their probesets they are not fit for purpose and their values should be excluded when calculating gene expression values. This has serious implications, since more than 40% of the probesets in the HG-U133A GeneChip contain at least one such probe. Future array designs should avoid these untrustworthy probes.


Subject(s)
Chromosome Mapping/methods , DNA Probes , Gene Expression Profiling/methods , Oligonucleotide Array Sequence Analysis/methods , Databases, Genetic , Gene Expression , Guanine , Sequence Analysis, DNA
11.
Biochem Soc Trans ; 36(Pt 3): 511-3, 2008 Jun.
Article in English | MEDLINE | ID: mdl-18481992

ABSTRACT

We are developing a computational pipeline to use surveys of Affymetrix GeneChips as a discovery tool for unravelling some of the biology associated with post-transcriptional processing of RNA. This work involves the integration of a number of bioinformatics resources, from comparing annotations to processing images to determining the structure of transcripts. The rapidly growing datasets of GeneChips available to the community puts us in a strong position to discover novel biology about post-transcriptional processing, and should enable us to determine the mechanisms by which some groups of genes make co-ordinated changes in their production of isoforms.


Subject(s)
Oligonucleotide Array Sequence Analysis/methods , RNA/metabolism , Alternative Splicing/genetics , Animals , Humans , Polyadenylation
12.
J Integr Bioinform ; 5(2)2008 Aug 25.
Article in English | MEDLINE | ID: mdl-20134059

ABSTRACT

We have developed a computational pipeline to analyse large surveys of Affymetrix GeneChips, for example NCBI's Gene Expression Omnibus. GEO samples data for many organisms, tissues and phenotypes. Because of this experimental diversity, any observed correlations between probe intensities can be associated either with biology that is robust, such as common co-expression, or with systematic biases associated with the GeneChip technology. Our bioinformatics pipeline integrates the mapping of probes to exons, quality control checks on each GeneChip which identifies flaws in hybridization quality, and the mining of correlations in intensities between groups of probes. The output from our pipeline has enabled us to identify systematic biases in GeneChip data. We are also able to use the pipeline as a discovery tool for biology. We have discovered that in the majority of cases, Affymetrix probesets on Human GeneChips do not measure one unique block of transcription. Instead we see numerous examples of outlier probes. Our study has also identified that in a number of probesets the mismatch probes are an informative diagnostic of expression, rather than providing a measure of background contamination. We report evidence for systematic biases in GeneChip technology associated with probe-probe interactions. We also see signatures associated with post-transcriptional processing of RNA, such as alternative polyadenylation.


Subject(s)
Genomics/instrumentation , Oligonucleotide Array Sequence Analysis/instrumentation , Databases, Genetic , Exons , Gene Expression Profiling , Genomics/methods
13.
Brief Bioinform ; 9(1): 25-33, 2008 Jan.
Article in English | MEDLINE | ID: mdl-18057073

ABSTRACT

We present an overview of image-processing methods for Affymetrix GeneChips. All GeneChips are affected to some extent by spatially coherent defects and image processing has a number of potential impacts on the downstream analysis of GeneChip data. Fortunately, there are now a number of robust and accurate algorithms, which identify the most disabling defects. One group of algorithms concentrate on the transformation from the original hybridisation DAT image to the representative CEL file. Another set uses dedicated pattern recognition routines to detect different types of hybridisation defect in replicates. A third type exploits the information provided by public repositories of GeneChips (such as GEO). The use of these algorithms improves the sensitivity of GeneChips, and should be a prerequisite for studies in which there are only few probes per relevant biological signal, such as exon arrays and SNP chips.


Subject(s)
Algorithms , Gene Expression Profiling/methods , Image Enhancement/methods , Image Interpretation, Computer-Assisted/methods , In Situ Hybridization, Fluorescence/methods , Microscopy, Fluorescence, Multiphoton/methods , Oligonucleotide Array Sequence Analysis/methods
14.
Bioinformatics ; 20(17): 3206-13, 2004 Nov 22.
Article in English | MEDLINE | ID: mdl-15231534

ABSTRACT

MOTIVATION: Converting the vast quantity of free-format text found in journals into a concise, structured format makes the researcher's quest for information easier. Recently, several information extraction systems have been developed that attempt to simplify the retrieval and analysis of biological and medical data. Most of this work has used the abstract alone, owing to the convenience of access and the quality of data. Abstracts are generally available through central collections with easy direct access (e.g. PubMed). The full-text papers contain more information, but are distributed across many locations (e.g. publishers' web sites, journal web sites and local repositories), making access more difficult. In this paper, we present BioRAT, a new information extraction (IE) tool, specifically designed to perform biomedical IE, and which is able to locate and analyse both abstracts and full-length papers. BioRAT is a Biological Research Assistant for Text mining, and incorporates a document search ability with domain-specific IE. RESULTS: We show first, that BioRAT performs as well as existing systems, when applied to abstracts; and second, that significantly more information is available to BioRAT through the full-length papers than via the abstracts alone. Typically, less than half of the available information is extracted from the abstract, with the majority coming from the body of each paper. Overall, BioRAT recalled 20.31% of the target facts from the abstracts with 55.07% precision, and achieved 43.6% recall with 51.25% precision on full-length papers.


Subject(s)
Abstracting and Indexing/methods , Biology/methods , Databases, Bibliographic , Information Storage and Retrieval/methods , Natural Language Processing , Periodicals as Topic , Software , Algorithms , Artificial Intelligence , Bibliometrics , Database Management Systems , Documentation/methods , User-Computer Interface , Vocabulary, Controlled
SELECTION OF CITATIONS
SEARCH DETAIL
...