Search | VHL Regional Portal

Benchmarking computational methods to identify spatially variable genes and peaks.

Li, Zhijian; Patel, Zain M; Song, Dongyuan; Yan, Guanao; Li, Jingyi Jessica; Pinello, Luca.

bioRxiv ; 2023 Dec 03.

Article in English | MEDLINE | ID: mdl-38076922

ABSTRACT

Spatially resolved transcriptomics offers unprecedented insight by enabling the profiling of gene expression within the intact spatial context of cells, effectively adding a new and essential dimension to data interpretation. To efficiently detect spatial structure of interest, an essential step in analyzing such data involves identifying spatially variable genes. Despite researchers having developed several computational methods to accomplish this task, the lack of a comprehensive benchmark evaluating their performance remains a considerable gap in the field. Here, we present a systematic evaluation of 14 methods using 60 simulated datasets generated by four different simulation strategies, 12 real-world transcriptomics, and three spatial ATAC-seq datasets. We find that spatialDE2 consistently outperforms the other benchmarked methods, and Moran's I achieves competitive performance in different experimental settings. Moreover, our results reveal that more specialized algorithms are needed to identify spatially variable peaks.

Reconstruction of full-length LINE-1 progenitors from ancestral genomes.

Campitelli, Laura F; Yellan, Isaac; Albu, Mihai; Barazandeh, Marjan; Patel, Zain M; Blanchette, Mathieu; Hughes, Timothy R.

Genetics ; 221(3)2022 07 04.

Article in English | MEDLINE | ID: mdl-35552404

ABSTRACT

Sequences derived from the Long INterspersed Element-1 (L1) family of retrotransposons occupy at least 17% of the human genome, with 67 distinct subfamilies representing successive waves of expansion and extinction in mammalian lineages. L1s contribute extensively to gene regulation, but their molecular history is difficult to trace, because most are present only as truncated and highly mutated fossils. Consequently, L1 entries in current databases of repeat sequences are composed mainly of short diagnostic subsequences, rather than full functional progenitor sequences for each subfamily. Here, we have coupled 2 levels of sequence reconstruction (at the level of whole genomes and L1 subfamilies) to reconstruct progenitor sequences for all human L1 subfamilies that are more functionally and phylogenetically plausible than existing models. Most of the reconstructed sequences are at or near the canonical length of L1s and encode uninterrupted ORFs with expected protein domains. We also show that the presence or absence of binding sites for KRAB-C2H2 Zinc Finger Proteins, even in ancient-reconstructed progenitor L1s, mirrors binding observed in human ChIP-exo experiments, thus extending the arms race and domestication model. RepeatMasker searches of the modern human genome suggest that the new models may be able to assign subfamily resolution identities to previously ambiguous L1 instances. The reconstructed L1 sequences will be useful for genome annotation and functional study of both L1 evolution and L1 contributions to host regulatory networks.

Subject(s)

Long Interspersed Nucleotide Elements , Retroelements , Animals , Evolution, Molecular , Genome, Human , Humans , Mammals/genetics , Open Reading Frames , Phylogeny , Repetitive Sequences, Nucleic Acid , Retroelements/genetics

Global properties of regulatory sequences are predicted by transcription factor recognition mechanisms.

Patel, Zain M; Hughes, Timothy R.

Genome Biol ; 22(1): 285, 2021 10 07.

Article in English | MEDLINE | ID: mdl-34620190

ABSTRACT

BACKGROUND: Mammalian genomes contain millions of putative regulatory sequences, which are delineated by binding of multiple transcription factors. The degree to which spacing and orientation constraints among transcription factor binding sites contribute to the recognition and identity of regulatory sequence is an unresolved but important question that impacts our understanding of genome function and evolution. Global mechanisms that underlie phenomena including the size of regulatory sequences, their uniqueness, and their evolutionary turnover remain poorly described. RESULTS: Here, we ask whether models incorporating different degrees of spacing and orientation constraints among transcription factor binding sites are broadly consistent with several global properties of regulatory sequence. These properties include length, sequence diversity, turnover rate, and dominance of specific TFs in regulatory site identity and cell type specification. Models with and without spacing and orientation constraints are generally consistent with all observed properties of regulatory sequence, and with regulatory sequences being fundamentally small (~ 1 nucleosome). Uniqueness of regulatory regions and their rapid evolutionary turnover are expected under all models examined. An intriguing issue we identify is that the complexity of eukaryotic regulatory sites must scale with the number of active transcription factors, in order to accomplish observed specificity. CONCLUSIONS: Models of transcription factor binding with or without spacing and orientation constraints predict that regulatory sequences should be fundamentally short, unique, and turn over rapidly. We posit that the existence of master regulators may be, in part, a consequence of evolutionary pressure to limit the complexity and increase evolvability of regulatory sites.

Subject(s)

Regulatory Elements, Transcriptional , Transcription Factors/metabolism , Binding Sites , Genome, Human , Humans , Models, Genetic , Nucleosomes , Protein Binding

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL