Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 4 de 4
Filter
Add more filters










Database
Language
Publication year range
1.
IEEE Trans Neural Netw Learn Syst ; 34(10): 7014-7023, 2023 Oct.
Article in English | MEDLINE | ID: mdl-35113788

ABSTRACT

In this work, we describe our efforts in addressing two typical challenges involved in the popular text classification methods when they are applied to text moderation: the representation of multibyte characters and word obfuscations. Specifically, a multihot byte-level scheme is developed to significantly reduce the dimension of one-hot character-level encoding caused by the multiplicity of instance-scarce non-ASCII characters. In addition, we introduce a simple yet effective weighting approach for fusing n-gram features to empower the classical logistic regression. Surprisingly, it outperforms well-tuned representative neural networks greatly. As a continual effort toward text moderation, we endeavor to analyze the current state-of-the-art (SOTA) algorithm bidirectional encoder representations from transformers (BERT), which works well in context understanding but performs poorly on intentional word obfuscations. To resolve this crux, we then develop an enhanced variant and remedy this drawback by integrating byte and character decomposition. It advances the SOTA performance on the largest abusive language datasets as demonstrated by our comprehensive experiments. Our work offers a feasible and effective framework to tackle word obfuscations.

2.
Microbiol Resour Announc ; 11(11): e0070422, 2022 Nov 17.
Article in English | MEDLINE | ID: mdl-36255294

ABSTRACT

Phage Vardy is a lytic siphovirus isolated from creek soil in Cullowhee, NC, using Gordonia rubripertincta NRRL B-16540. Vardy's 60,144-bp genome contains 90 predicted genes and five copies of a 50-bp motif that may regulate gene expression. Based on gene content similarity, Vardy is assigned to cluster DJ.

3.
J Math Imaging Vis ; 59(2): 187-210, 2017 Oct.
Article in English | MEDLINE | ID: mdl-30233108

ABSTRACT

Transport based distances, such as the Wasserstein distance and earth mover'sdistance, have been shown to be an effective tool in signal and image analysis. The success of transport based distances is in part due to their Lagrangian nature which allows it to capture the important variations in many signal classes. However these distances require the signal to be nonnegative and normalized. Furthermore, the signals are considered as measures and compared by redistributing (transporting) them, which does not directly take into account the signal intensity. Here we study a transport-based distance, called the TLp distance, that combines Lagrangian and intensity modelling and is directly applicable to general, non-positive and multi-channelled signals. The distance can be computed by existing numerical methods. We give an overview of the basic properties of this distance and applications to classification, with multi-channelled non-positive one-dimensional signals and two-dimensional images, and color transfer.

4.
IEEE Signal Process Mag ; 34(4): 43-59, 2017 Jul.
Article in English | MEDLINE | ID: mdl-29962824

ABSTRACT

Transport-based techniques for signal and data analysis have received increased attention recently. Given their ability to provide accurate generative models for signal intensities and other data distributions, they have been used in a variety of applications including content-based retrieval, cancer detection, image super-resolution, and statistical machine learning, to name a few, and shown to produce state of the art results in several applications. Moreover, the geometric characteristics of transport-related metrics have inspired new kinds of algorithms for interpreting the meaning of data distributions. Here we provide a practical overview of the mathematical underpinnings of mass transport-related methods, including numerical implementation, as well as a review, with demonstrations, of several applications. Software accompanying this tutorial is available at [43].

SELECTION OF CITATIONS
SEARCH DETAIL
...