ABSTRACT
The combination of tandem spectrometry and database searching is one of the most popular technologies for protein identification.However,only those proteins in the searching database could be identified,and current database is far from completeness.So it is necessary to mining the MS/MS data comprehensively,in which novel protein identification is the most important one.The definition of novel protein could be divided into three levels according to their annotations of sequences and functions.As a part of protein identification,the main approaches used to identify novel protein are basing on the following two different ways:de novo sequencing combined with similarity search and searching against nucleotide acid databases such as EST or genome databases.Several mature or newly developed methods and techniques were summarized,and the problems and strategies discussed here would be helpful for the related researches.
ABSTRACT
Data analysis poses a significant challenge to the large-scale proteomics studies. Based on the structured and controlled vocabularies-Gene Ontology (GO), and the GO annotation from related databases, a strategy composed of several programs and local databases is developed to identify the functional distribution and the significantly enriched functional categories of the proteomic expression profile. It would be helpful for understanding the overall functions of these identified proteins and supply the fundamental information for further bioinformatics exploration. This strategy has been successfully used in the Human Fetal Liver (HFL) proteomic research, which is available online at http://www.hupo.org.cn/GOfact/.
ABSTRACT
More and more DNA sequences have been obtained since the start-up of human genome project. Powerful system is badly needed for data mining on these DNA sequences. Based on a personal computer and Linux operating system, the Phred/Phrap/Consed software and Blast software were used to construct a platform for batch analysis of the sequences, including identifying raw DNA sequence from chromatogram file, vector sequence removing, contig analysis (sequence assembly), repeat sequence identifying and sequence similarity analysis. Result demonstrated that this robust platform could accelerate data analysis for large-scale DNA sequencing.