Search | VHL Regional Portal

End-to-End Train Horn Detection for Railway Transit Safety.

Tran, Van-Thuan; Tsai, Wei-Ho; Furletov, Yury; Gorodnichev, Mikhail.

Sensors (Basel) ; 22(12)2022 Jun 12.

Article in English | MEDLINE | ID: mdl-35746234

ABSTRACT

The train horn sound is an active audible warning signal used for warning commuters and railway employees of the oncoming train(s), assuring a smooth operation and traffic safety, especially at barrier-free crossings. This work studies deep learning-based approaches to develop a system providing the early detection of train arrival based on the recognition of train horn sounds from the traffic soundscape. A custom dataset of train horn sounds, car horn sounds, and traffic noises is developed to conduct experiments and analysis. We propose a novel two-stream end-to-end CNN model (i.e., THD-RawNet), which combines two approaches of feature extraction from raw audio waveforms, for audio classification in train horn detection (THD). Besides a stream with a sequential one-dimensional CNN (1D-CNN) as in existing sound classification works, we propose to utilize multiple 1D-CNN branches to process raw waves in different temporal resolutions to extract an image-like representation for the 2D-CNN classification part. Our experiment results and comparative analysis have proved the effectiveness of the proposed two-stream network and the method of combining features extracted in multiple temporal resolutions. The THD-RawNet obtained better accuracies and robustness compared to those of baseline models trained on either raw audio or handcrafted features, in which at the input size of one second the network yielded an accuracy of 95.11% for testing data in normal traffic conditions and remained above a 93% accuracy for the considerable noisy condition of-10 dB SNR. The proposed THD system can be integrated into the smart railway crossing systems, private cars, and self-driving cars to improve railway transit safety.

Subject(s)

Automobiles , Noise , Humans

Speech utterance clustering based on the maximization of within-cluster homogeneity of speaker voice characteristics.

Tsai, Wei-Ho; Wang, Hsin-Min.

J Acoust Soc Am ; 120(3): 1631-45, 2006 Sep.

Article in English | MEDLINE | ID: mdl-17004485

ABSTRACT

This paper investigates the problem of how to partition unknown speech utterances into a set of clusters, such that each cluster consists of utterances from only one speaker, and the number of clusters reflects the unknown speaker population size. The proposed method begins by specifying a certain number of clusters, corresponding to one of the possible speaker population sizes, and then maximizes the level of overall within-cluster homogeneity of the speakers' voice characteristics. The within-cluster homogeneity is characterized by the likelihood probability that a cluster model, trained using all the utterances within a cluster, matches each of the within-cluster utterances. To attain the maximal sum of likelihood probabilities for all utterances, the proposed method applies a genetic algorithm to determine the cluster in which each utterance should be located. For greater computational efficiency, also proposed is a clustering criterion that approximates the likelihood probability with a divergence-based model similarity between a cluster and each of the within-cluster utterances. The clustering method then examines various legitimate numbers of clusters by adapting the Bayesian information criterion to determine the most likely speaker population size. The experimental results show the superiority of the proposed method over conventional methods based on hierarchical clustering.

Subject(s)

Algorithms , Models, Statistical , Speech/physiology , Voice/physiology , Bayes Theorem , Cluster Analysis , Humans , Population

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL