Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 6 de 6
Filter
Add more filters










Database
Language
Publication year range
1.
PLoS One ; 10(4): e0122855, 2015.
Article in English | MEDLINE | ID: mdl-25874768

ABSTRACT

The vast amount and diversity of the content shared on social media can pose a challenge for any business wanting to use it to identify potential customers. In this paper, our aim is to investigate the use of both unsupervised and supervised learning methods for target audience classification on Twitter with minimal annotation efforts. Topic domains were automatically discovered from contents shared by followers of an account owner using Twitter Latent Dirichlet Allocation (LDA). A Support Vector Machine (SVM) ensemble was then trained using contents from different account owners of the various topic domains identified by Twitter LDA. Experimental results show that the methods presented are able to successfully identify a target audience with high accuracy. In addition, we show that using a statistical inference approach such as bootstrapping in over-sampling, instead of using random sampling, to construct training datasets can achieve a better classifier in an SVM ensemble. We conclude that such an ensemble system can take advantage of data diversity, which enables real-world applications for differentiating prospective customers from the general audience, leading to business advantage in the crowded social media space.


Subject(s)
Data Mining/methods , Marketing/methods , Social Media/statistics & numerical data , Support Vector Machine , Blogging/economics , Blogging/statistics & numerical data , Data Mining/statistics & numerical data , Humans , Social Media/economics
2.
FEBS Lett ; 580(9): 2216-26, 2006 Apr 17.
Article in English | MEDLINE | ID: mdl-16574106

ABSTRACT

Hepatocellular carcinoma (HCC) is the most common primary cancer of the liver. Thus there is great interest to identify novel HCC diagnostic markers for early detection of the disease and tumour specific associated proteins as potential therapeutic targets in the treatment of HCC. Currently, we are screening for early biomarkers as well as studying the development of HCC by identifying the differentially expressed proteins of HCC tissues during different stages of disease progression. We have isolated, by reverse transcriptase and polymerase chain reaction (RT-PCR), a 1741bp cDNA encoding a protein that is differentially expressed in HCC. This novel protein was initially identified by proteome analysis and we designate it as Hcc-2. The protein is upregulated in poorly-differentiated HCC but unchanged in well-differentiated HCC. The full-length transcript encodes a protein of 363 amino acids that has three thioredoxin (Trx) (CGHC) domains and an ER retention signal motif (KDEL). Fluorescence GFP tagging to this protein confirmed that it is localized predominantly to the cytoplasm when expressed in mammalian cells. Protein alignment analysis shows that it is a variant of the TXNDC5 gene, and the human variants found in Genbank all show close similarity in protein sequence. Functionally, it exhibits the anticipated reductase activity in the insulin disulfide reduction assay, but its other biological role in cell function remains to be elucidated. This work demonstrates that an integrated proteomics and genomics approach can be a very powerful means of discovering potential diagnostic and therapeutic protein targets for cancer therapy.


Subject(s)
Biomarkers, Tumor/biosynthesis , Carcinoma, Hepatocellular/enzymology , Gene Expression Regulation, Neoplastic , Liver Neoplasms/enzymology , Neoplasm Proteins/biosynthesis , Thioredoxins/biosynthesis , Amino Acid Motifs/genetics , Amino Acid Sequence , Animals , Biomarkers, Tumor/genetics , CHO Cells , Carcinoma, Hepatocellular/diagnosis , Carcinoma, Hepatocellular/genetics , Cell Differentiation , Cricetinae , Cricetulus , Endoplasmic Reticulum/enzymology , Endoplasmic Reticulum/genetics , Gene Expression Profiling/methods , Humans , Liver Neoplasms/diagnosis , Liver Neoplasms/genetics , Molecular Sequence Data , Neoplasm Proteins/genetics , Protein Structure, Tertiary , Proteome/biosynthesis , Proteomics/methods , Reverse Transcriptase Polymerase Chain Reaction/methods , Sequence Analysis, RNA/methods , Sequence Homology, Amino Acid , Thioredoxins/genetics , Up-Regulation
3.
Proteomics ; 6(6): 1758-69, 2006 Mar.
Article in English | MEDLINE | ID: mdl-16456885

ABSTRACT

In the field of proteomics, the increasing difficulty to unify the data format, due to the different platforms/instrumentation and laboratory documentation systems, greatly hinders experimental data verification, exchange, and comparison. Therefore, it is essential to establish standard formats for every necessary aspect of proteomics data. One of the recently published data models is the proteomics experiment data repository [Taylor, C. F., Paton, N. W., Garwood, K. L., Kirby, P. D. et al., Nat. Biotechnol. 2003, 21, 247-254]. Compliant with this format, we developed the systematic proteomics laboratory analysis and storage hub (SPLASH) database system as an informatics infrastructure to support proteomics studies. It consists of three modules and provides proteomics researchers a common platform to store, manage, search, analyze, and exchange their data. (i) Data maintenance includes experimental data entry and update, uploading of experimental results in batch mode, and data exchange in the original PEDRo format. (ii) The data search module provides several means to search the database, to view either the protein information or the differential expression display by clicking on a gel image. (iii) The data mining module contains tools that perform biochemical pathway, statistics-associated gene ontology, and other comparative analyses for all the sample sets to interpret its biological meaning. These features make SPLASH a practical and powerful tool for the proteomics community.


Subject(s)
Database Management Systems , Databases, Factual , Databases, Protein , Information Storage and Retrieval/methods , Proteomics , Mass Spectrometry , Reproducibility of Results , Software , User-Computer Interface
4.
Proteomics ; 5(8): 2258-71, 2005 May.
Article in English | MEDLINE | ID: mdl-15852300

ABSTRACT

Proteome analysis of human hepatocellular carcinoma tissues was conducted using two-dimensional difference gel electrophoresis coupled with mass spectrometry. Paired samples from the normal and tumor region of resected human liver were labeled with Cy3 and Cy5, respectively while the pooled standard sample was labeled with Cy2. After analysis by the DeCyder software, protein spots that exhibited at least a two-fold difference in intensity were excised for in-gel tryptic digestion and matrix-assisted laser desorption/ionization-time of flight mass spectrometry. A total of 6 and 42 proteins were successfully identified from the well- and poorly-differentiated samples, respectively. The majority of these proteins are related to detoxification/oxidative stress and metabolism. Three down-regulated metabolic enzymes, methionine adenosyltransferase, glycine N-methyltransferase, and betaine-homocysteine S-methyltransferase that are involved in the methylation cycle in the liver are of special interest. Their expression levels, especially, methionine adenosyltransferase, seemed to have a major influence on the level of S-adenosylmethionine (AdoMet), a vital intermediate metabolite required for the proper functioning of the liver. Recent work has shown that chronic deficiency in AdoMet in the liver results in spontaneous development of steatohepatitis and hepatocellular carcinoma, and hence the down-regulation of hepatic methionine adenosyltransferase in our hepatocellular carcinoma samples is in line with this observation. Moreover, when a comparison is made between the differentially expressed proteins from our human hepatocellular carcinoma samples and from the liver tissues of knockout mice deficient in methionine adenosyltransferase, there is a fairly good correlation between them.


Subject(s)
Carcinoma, Hepatocellular/chemistry , Electrophoresis, Gel, Two-Dimensional , Liver Neoplasms/chemistry , Mass Spectrometry , Proteome/analysis , Carcinoma, Hepatocellular/pathology , Humans , Image Processing, Computer-Assisted , Liver Neoplasms/pathology , Silver Staining , Spectrometry, Mass, Matrix-Assisted Laser Desorption-Ionization , Trypsin/pharmacology
5.
Proteomics ; 5(4): 876-84, 2005 Mar.
Article in English | MEDLINE | ID: mdl-15717327

ABSTRACT

Knowledge of protein-protein interaction is useful for elucidating protein function via the concept of 'guilt-by-association'. A statistical learning method, Support Vector Machine (SVM), has recently been explored for the prediction of protein-protein interactions using artificial shuffled sequences as hypothetical noninteracting proteins and it has shown promising results (Bock, J. R., Gough, D. A., Bioinformatics 2001, 17, 455-460). It remains unclear however, how the prediction accuracy is affected if real protein sequences are used to represent noninteracting proteins. In this work, this effect is assessed by comparison of the results derived from the use of real protein sequences with that derived from the use of shuffled sequences. The real protein sequences of hypothetical noninteracting proteins are generated from an exclusion analysis in combination with subcellular localization information of interacting proteins found in the Database of Interacting Proteins. Prediction accuracy using real protein sequences is 76.9% compared to 94.1% using artificial shuffled sequences. The discrepancy likely arises from the expected higher level of difficulty for separating two sets of real protein sequences than that for separating a set of real protein sequences from a set of artificial sequences. The use of real protein sequences for training a SVM classification system is expected to give better prediction results in practical cases. This is tested by using both SVM systems for predicting putative protein partners of a set of thioredoxin related proteins. The prediction results are consistent with observations, suggesting that real sequence is more practically useful in development of SVM classification system for facilitating protein-protein interaction prediction.


Subject(s)
Computational Biology/methods , Databases, Protein , Proteomics/methods , Algorithms , Animals , Drosophila melanogaster , Humans , Protein Binding , Protein Conformation , Protein Folding , Proteins/chemistry , ROC Curve , Reproducibility of Results , Saccharomyces cerevisiae/metabolism , Saccharomyces cerevisiae Proteins/chemistry , Sequence Analysis, Protein/methods , Software , Thioredoxins/chemistry
6.
J Chromatogr B Analyt Technol Biomed Life Sci ; 771(1-2): 303-28, 2002 May 05.
Article in English | MEDLINE | ID: mdl-12016006

ABSTRACT

Hepatocellular carcinoma (HCC or hepatoma) is the most common primary cancer of the liver. Persistent viral infection by the hepatic B or C virus is probably the most important cause of HCC worldwide. It is responsible for approximately one million deaths each year, predominantly in the underdeveloped and developing countries, but its incidence is also on the rise in the developed countries. For most patients suffering from HCC, long-term survival is rare, as they are presented late and are often unsuitable for curative treatment. Thus there is great interest to identify novel HCC diagnostic markers for early detection of the disease, and tumour specific associated proteins as potential therapeutic targets in the treatment of HCC. Proteome analyses of HCC cell lines and liver tumour tissues should facilitate the screening and discovery of these HCC proteins. The creation of a comprehensive HCC proteome database would be an important first step towards achieving this goal. This review presents an update of the two-dimensional electrophoresis proteome database of the cell line, HCC-M, which is also now freely accessible through the World Wide Web at http://proteome.btc.nus.edu.sg/hccm/.


Subject(s)
Carcinoma, Hepatocellular/metabolism , Databases, Protein , Liver Neoplasms/metabolism , Proteome , Electrophoresis, Gel, Two-Dimensional , Humans
SELECTION OF CITATIONS
SEARCH DETAIL
...