Search | VHL Regional Portal

Integrated transcriptome catalog of Tenualosa ilisha as a resource for gene discovery and expression profiling.

Chowdhury, Md Arko Ayon; Islam, Md Rakibul; Amin, Al; Mou, Sadia Noor; Ullah, Kazi Newaz; Baten, Abdul; Shoyaib, Mohammad; Ali, Amin Ahsan; Chowdhury, Farhana Tasnim; Rahi, Md Lifat; Khan, Haseena; Amin, M Ashraful; Islam, Mohammad Riazul.

Sci Data ; 10(1): 214, 2023 04 17.

Article in English | MEDLINE | ID: mdl-37062771

ABSTRACT

The silver pride of Bangladesh, migratory shad, Tenualosa ilisha (Hilsa), makes the highest contribution to the total fish production of Bangladesh. Despite its noteworthy contribution, a well-annotated transcriptome data is not available. Here we report a transcriptomic catalog of Hilsa, constructed by assembling RNA-Seq reads from different tissues of the fish including brain, gill, kidney, liver, and muscle. Hilsa fish were collected from different aquatic habitats (fresh, brackish, and sea water) and the sequencing was performed in the next generation sequencing (NGS) platform. De novo assembly of the sequences obtained from 46 cDNA libraries revealed 462,085 transcript isoforms that were subsequently annotated using the Universal Protein Resource Knowledgebase (UniPortKB) as a reference. Starting from the sampling to final annotation, all the steps along with the workflow are reported here. This study will provide a significant resource for ongoing and future research on Hilsa for transcriptome based expression profiling and identification of candidate genes.

Subject(s)

Fishes , Transcriptome , Animals , Fishes/genetics , Gene Expression Profiling , Genetic Association Studies , Molecular Sequence Annotation , Protein Isoforms/genetics

Use of relevancy and complementary information for discriminatory gene selection from high-dimensional gene expression data.

Haque, Md Nazmul; Sharmin, Sadia; Ali, Amin Ahsan; Sajib, Abu Ashfaqur; Shoyaib, Mohammad.

PLoS One ; 16(10): e0230164, 2021.

Article in English | MEDLINE | ID: mdl-34613963

ABSTRACT

With the advent of high-throughput technologies, life sciences are generating a huge amount of varied biomolecular data. Global gene expression profiles provide a snapshot of all the genes that are transcribed in a cell or in a tissue under a particular condition. The high-dimensionality of such gene expression data (i.e., very large number of features/genes analyzed with relatively much less number of samples) makes it difficult to identify the key genes (biomarkers) that are truly attributing to a particular phenotype or condition, (such as cancer), de novo. For identifying the key genes from gene expression data, among the existing literature, mutual information (MI) is one of the most successful criteria. However, the correction of MI for finite sample is not taken into account in this regard. It is also important to incorporate dynamic discretization of genes for more relevant gene selection, although this is not considered in the available methods. Besides, it is usually suggested in current studies to remove redundant genes which is particularly inappropriate for biological data, as a group of genes may connect to each other for downstreaming proteins. Thus, despite being redundant, it is needed to add the genes which provide additional useful information for the disease. Addressing these issues, we proposed Mutual information based Gene Selection method (MGS) for selecting informative genes. Moreover, to rank these selected genes, we extended MGS and propose two ranking methods on the selected genes, such as MGSf-based on frequency and MGSrf-based on Random Forest. The proposed method not only obtained better classification rates on gene expression datasets derived from different gene expression studies compared to recently reported methods but also detected the key genes relevant to pathways with a causal relationship to the disease, which indicate that it will also able to find the responsible genes for an unknown disease data.

Subject(s)

Gene Expression Profiling/methods , Gene Expression/genetics , High-Throughput Screening Assays/methods , Algorithms , Humans , Phenotype

A noise-aware coding scheme for texture classification.

Shoyaib, Mohammad; Abdullah-Al-Wadud, M; Chae, Oksam.

Sensors (Basel) ; 11(8): 8028-44, 2011.

Article in English | MEDLINE | ID: mdl-22164060

ABSTRACT

Texture-based analysis of images is a very common and much discussed issue in the fields of computer vision and image processing. Several methods have already been proposed to codify texture micro-patterns (texlets) in images. Most of these methods perform well when a given image is noise-free, but real world images contain different types of signal-independent as well as signal-dependent noises originated from different sources, even from the camera sensor itself. Hence, it is necessary to differentiate false textures appearing due to the noises, and thus, to achieve a reliable representation of texlets. In this proposal, we define an adaptive noise band (ANB) to approximate the amount of noise contamination around a pixel up to a certain extent. Based on this ANB, we generate reliable codes named noise tolerant ternary pattern (NTTP) to represent the texlets in an image. Extensive experiments on several datasets from renowned texture databases, such as the Outex and the Brodatz database, show that NTTP performs much better than the state-of-the-art methods.

Subject(s)

Pattern Recognition, Automated , Algorithms , Databases, Factual , Image Interpretation, Computer-Assisted/methods , Image Processing, Computer-Assisted/methods , Models, Statistical , Noise , Normal Distribution , Pattern Recognition, Automated/methods , Photons , Reproducibility of Results , Surface Properties

Pol II promoter prediction using characteristic 4-mer motifs: a machine learning approach.

Anwar, Firoz; Baker, Syed Murtuza; Jabid, Taskeed; Mehedi Hasan, Md; Shoyaib, Mohammad; Khan, Haseena; Walshe, Ray.

BMC Bioinformatics ; 9: 414, 2008 Oct 04.

Article in English | MEDLINE | ID: mdl-18834544

ABSTRACT

BACKGROUND: Eukaryotic promoter prediction using computational analysis techniques is one of the most difficult jobs in computational genomics that is essential for constructing and understanding genetic regulatory networks. The increased availability of sequence data for various eukaryotic organisms in recent years has necessitated for better tools and techniques for the prediction and analysis of promoters in eukaryotic sequences. Many promoter prediction methods and tools have been developed to date but they have yet to provide acceptable predictive performance. One obvious criteria to improve on current methods is to devise a better system for selecting appropriate features of promoters that distinguish them from non-promoters. Secondly improved performance can be achieved by enhancing the predictive ability of the machine learning algorithms used. RESULTS: In this paper, a novel approach is presented in which 128 4-mer motifs in conjunction with a non-linear machine-learning algorithm utilising a Support Vector Machine (SVM) are used to distinguish between promoter and non-promoter DNA sequences. By applying this approach to plant, Drosophila, human, mouse and rat sequences, the classification model has showed 7-fold cross-validation percentage accuracies of 83.81%, 94.82%, 91.25%, 90.77% and 82.35% respectively. The high sensitivity and specificity value of 0.86 and 0.90 for plant; 0.96 and 0.92 for Drosophila; 0.88 and 0.92 for human; 0.78 and 0.84 for mouse and 0.82 and 0.80 for rat demonstrate that this technique is less prone to false positive results and exhibits better performance than many other tools. Moreover, this model successfully identifies location of promoter using TATA weight matrix. CONCLUSION: The high sensitivity and specificity indicate that 4-mer frequencies in conjunction with supervised machine-learning methods can be beneficial in the identification of RNA pol II promoters comparative to other methods. This approach can be extended to identify promoters in sequences for other eukaryotic genomes.

Subject(s)

Artificial Intelligence , Nucleic Acid Conformation , Promoter Regions, Genetic , RNA Polymerase II/genetics , Sequence Analysis, DNA/methods , Animals , Databases, Nucleic Acid , Drosophila Proteins/genetics , Eukaryotic Cells , Genomics/methods , Humans , Mice , Rats , Reproducibility of Results , Sensitivity and Specificity , Structure-Activity Relationship

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL