Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 15 de 15
Filter
1.
Sci Rep ; 13(1): 15100, 2023 09 12.
Article in English | MEDLINE | ID: mdl-37699940

ABSTRACT

This study proposes a method to extract the signature bands from the deep learning models of multispectral data converted from the hyperspectral data. The signature bands with two deep-learning models were further used to predict the sugar content of the Syzygium samarangense. Firstly, the hyperspectral data with the bandwidths lower than 2.5 nm were converted to the spectral data with multiple bandwidths higher than 2.5 nm to simulate the multispectral data. The convolution neural network (CNN) and the feedforward neural network (FNN) used these spectral data to predict the sugar content of the Syzygium samarangense and obtained the lowest mean absolute error (MAE) of 0.400° Brix and 0.408° Brix, respectively. Secondly, the absolute mean of the integrated gradient method was used to extract multiple signature bands from the CNN and FNN models for sugariness prediction. A total of thirty sets of six signature bands were selected from the CNN and FNN models, which were trained by using the spectral data with five bandwidths in the visible (VIS), visible to near-infrared (VISNIR), and visible to short-waved infrared (VISWIR) wavelengths ranging from 400 to 700 nm, 400 to 1000 nm, and 400 to 1700 nm. Lastly, these signature-band data were used to train the CNN and FNN models for sugar content prediction. The FNN model using VISWIR signature bands with a bandwidth of ± 12.5 nm had a minimum MAE of 0.390°Brix compared to the others. The CNN model using VISWIR signature bands with a bandwidth of ± 10 nm had the lowest MAE of 0.549° Brix compared to the other CNN models. The MAEs of the models with only six spectral bands were even better than those with tens or hundreds of spectral bands. These results reveal that six signature bands have the potential to be used in a small and compact multispectral device to predict the sugar content of the Syzygium samarangense.


Subject(s)
Pentaerythritol Tetranitrate , Syzygium , Sugars , Neural Networks, Computer , Radio Waves
2.
IEEE Trans Pattern Anal Mach Intell ; 45(2): 1906-1918, 2023 Feb.
Article in English | MEDLINE | ID: mdl-35344486

ABSTRACT

Learning the hidden dynamics from sequence data is crucial. Attention mechanism can be introduced to spotlight on the region of interest for sequential learning. Traditional attention was measured between a query and a sequence based on a discrete-time state trajectory. Such a mechanism could not characterize the irregularly-sampled sequence data. This paper presents an attentive differential network (ADN) where the attention over continuous-time dynamics is developed. The continuous-time attention is performed over the dynamics at all time. The missing information in irregular or sparse samples can be seamlessly compensated and attended. Self attention is computed to find the attended state trajectory. However, the memory cost for attention score between a query and a sequence is demanding since self attention treats all time instants as query points in an ordinary differential equation solver. This issue is tackled by imposing the causality constraint in causal ADN (CADN) where the query is merged up to current time. To enhance the model robustness, this study further explores a latent CADN where the attended dynamics are calculated in an encoder-decoder structure via Bayesian learning. Experiments on the irregularly-sampled actions, dialogues and bio-signals illustrate the merits of the proposed methods in action recognition, emotion recognition and mortality prediction, respectively.

3.
Article in English | MEDLINE | ID: mdl-35839196

ABSTRACT

Face reenactment aims to generate the talking face images of a target person given by a face image of source person. It is crucial to learn latent disentanglement to tackle such a challenging task through domain mapping between source and target images. The attributes or talking features due to domains or conditions become adjustable to generate target images from source images. This article presents an information-theoretic attribute factorization (AF) where the mixed features are disentangled for flow-based face reenactment. The latent variables with flow model are factorized into the attribute-relevant and attribute-irrelevant components without the need of the paired face images. In particular, the domain knowledge is learned to provide the condition to identify the talking attributes from real face images. The AF is guided in accordance with multiple losses for source structure, target structure, random-pair reconstruction, and sequential classification. The random-pair reconstruction loss is calculated by means of exchanging the attribute-relevant components within a sequence of face images. In addition, a new mutual information flow is constructed for disentanglement toward domain mapping, condition irrelevance, and condition relevance. The disentangled features are learned and controlled to generate image sequence with meaningful interpretation. Experiments on mouth reenactment illustrate the merit of individual and hybrid models for conditional generation and mapping based on the informative AF.

4.
Sci Rep ; 12(1): 2774, 2022 02 17.
Article in English | MEDLINE | ID: mdl-35177733

ABSTRACT

Sugariness is one of the most important indicators to measure the quality of Syzygium samarangense, which is also known as the wax apple. In general, farmers used to measure sugariness by testing the extracted juice of the wax apple products. Such a destructive way to measure sugariness is not only labor-consuming but also wasting products. Therefore, non-destructive and quick techniques for measuring sugariness would be significant for wax apple supply chains. Traditionally, the non-destructive method to predict the sugariness or the other indicators of the fruits was based on the reflectance spectra or Hyperspectral Images (HSIs) using linear regression such as Multi-Linear Regression (MLR), Principal Component Regression (PCR), and Partial Least Square Regression (PLSR), etc. However, these regression methods are usually too simple to precisely estimate the complicated mapping between the reflectance spectra or HSIs and the sugariness. This study presents the deep learning methods for sugariness prediction using the reflectance spectra or HSIs from the bottom of the wax apple. A non-destructive imaging system fabricated with two spectrum sensors and light sources is implemented to acquire the visible and infrared lights with a range of wavelengths. In particular, a specialized Convolutional Neural Network (CNN) with hyperspectral imaging is proposed by investigating the effect of different wavelength bands for sugariness prediction. Rather than extracting spatial features, the proposed CNN model was designed to extract spectral features of HSIs. In the experiments, the ground-truth value of sugariness is obtained from a commercial refractometer. The experimental results show that using the whole band range between 400 and 1700 nm achieves the best performance in terms of °Brix error. CNN models attain the °Brix error of ± 0.552, smaller than ± 0.597 using Feedforward Neural Network (FNN). Significantly, the CNN's test results show that the minor error in the interval 0 to 10°Brix and 10 to 11°Brix are ± 0.551 and ± 0.408, these results indicate that the model would have the capability to predict if sugariness is below 10°Brix or not, which would be similar to the human tongue. These results are much better than ± 1.441 and ± 1.379 by using PCR and PLSR, respectively. Moreover, this study provides the test error in each °Brix interval within one Brix, and the results show that the test error is varied considerably within different °Brix intervals, especially on PCR and PLSR. On the other hand, FNN and CNN obtain robust results in terms of test error.

5.
IEEE Trans Neural Netw Learn Syst ; 33(5): 2236-2245, 2022 05.
Article in English | MEDLINE | ID: mdl-33373306

ABSTRACT

Domain adaptation aims to reduce the mismatch between the source and target domains. A domain adversarial network (DAN) has been recently proposed to incorporate adversarial learning into deep neural networks to create a domain-invariant space. However, DAN's major drawback is that it is difficult to find the domain-invariant space by using a single feature extractor. In this article, we propose to split the feature extractor into two contrastive branches, with one branch delegating for the class-dependence in the latent space and another branch focusing on domain-invariance. The feature extractor achieves these contrastive goals by sharing the first and last hidden layers but possessing decoupled branches in the middle hidden layers. For encouraging the feature extractor to produce class-discriminative embedded features, the label predictor is adversarially trained to produce equal posterior probabilities across all of the outputs instead of producing one-hot outputs. We refer to the resulting domain adaptation network as "contrastive adversarial domain adaptation network (CADAN)." We evaluated the embedded features' domain-invariance via a series of speaker identification experiments under both clean and noisy conditions. Results demonstrate that the embedded features produced by CADAN lead to a 33% improvement in speaker identification accuracy compared with the conventional DAN.


Subject(s)
Neural Networks, Computer , Recognition, Psychology , Learning
6.
IEEE Trans Pattern Anal Mach Intell ; 44(9): 4975-4986, 2022 09.
Article in English | MEDLINE | ID: mdl-33755556

ABSTRACT

It is important and challenging to infer stochastic latent semantics for natural language applications. The difficulty in stochastic sequential learning is caused by the posterior collapse in variational inference. The input sequence is disregarded in the estimated latent variables. This paper proposes three components to tackle this difficulty and build the variational sequence autoencoder (VSAE) where sufficient latent information is learned for sophisticated sequence representation. First, the complementary encoders based on a long short-term memory (LSTM) and a pyramid bidirectional LSTM are merged to characterize global and structural dependencies of an input sequence, respectively. Second, a stochastic self attention mechanism is incorporated in a recurrent decoder. The latent information is attended to encourage the interaction between inference and generation in an encoder-decoder training procedure. Third, an autoregressive Gaussian prior of latent variable is used to preserve the information bound. Different variants of VSAE are proposed to mitigate the posterior collapse in sequence modeling. A series of experiments are conducted to demonstrate that the proposed individual and hybrid sequence autoencoders substantially improve the performance for variational sequential learning in language modeling and semantic understanding for document classification and summarization.


Subject(s)
Algorithms , Neural Networks, Computer , Learning , Normal Distribution , Semantics
7.
Sensors (Basel) ; 20(24)2020 Dec 17.
Article in English | MEDLINE | ID: mdl-33348786

ABSTRACT

Myocardial infarction (MI) is one of the most prevalent cardiovascular diseases worldwide and most patients suffer from MI without awareness. Therefore, early diagnosis and timely treatment are crucial to guarantee the life safety of MI patients. Most wearable monitoring devices only provide single-lead electrocardiography (ECG), which represents a major limitation for their applicability in diagnosis of MI. Incorporating the derived vectorcardiography (VCG) techniques can help monitor the three-dimensional electrical activities of human hearts. This study presents a patient-specific reconstruction method based on long short-term memory (LSTM) network to exploit both intra- and inter-lead correlations of ECG signals. MI-induced changes in the morphological and temporal wave features are extracted from the derived VCG using spline approximation. After the feature extraction, a classifier based on multilayer perceptron network is used for MI classification. Experiments on PTB diagnostic database demonstrate that the proposed system achieved satisfactory performance to differentiating MI patients from healthy subjects and to localizing the infarcted area.


Subject(s)
Myocardial Infarction , Signal Processing, Computer-Assisted , Vectorcardiography , Electrocardiography , Heart , Humans , Myocardial Infarction/diagnosis , Neural Networks, Computer
9.
IEEE Trans Neural Netw Learn Syst ; 29(5): 1998-2011, 2018 05.
Article in English | MEDLINE | ID: mdl-28436897

ABSTRACT

The growing interests in multiway data analysis and deep learning have drawn tensor factorization (TF) and neural network (NN) as the crucial topics. Conventionally, the NN model is estimated from a set of one-way observations. Such a vectorized NN is not generalized for learning the representation from multiway observations. The classification performance using vectorized NN is constrained, because the temporal or spatial information in neighboring ways is disregarded. More parameters are required to learn the complicated data structure. This paper presents a new tensor-factorized NN (TFNN), which tightly integrates TF and NN for multiway feature extraction and classification under a unified discriminative objective. This TFNN is seen as a generalized NN, where the affine transformation in an NN is replaced by the multilinear and multiway factorization for tensor-based NN. The multiway information is preserved through layerwise factorization. Tucker decomposition and nonlinear activation are performed in each hidden layer. The tensor-factorized error backpropagation is developed to train TFNN with the limited parameter size and computation time. This TFNN can be further extended to realize the convolutional TFNN (CTFNN) by looking at small subtensors through the factorized convolution. Experiments on real-world classification tasks demonstrate that TFNN and CTFNN attain substantial improvement when compared with an NN and a convolutional NN, respectively.

10.
IEEE Trans Pattern Anal Mach Intell ; 40(2): 318-331, 2018 02.
Article in English | MEDLINE | ID: mdl-28278458

ABSTRACT

Deep unfolding provides an approach to integrate the probabilistic generative models and the deterministic neural networks. Such an approach is benefited by deep representation, easy interpretation, flexible learning and stochastic modeling. This study develops the unsupervised and supervised learning of deep unfolded topic models for document representation and classification. Conventionally, the unsupervised and supervised topic models are inferred via the variational inference algorithm where the model parameters are estimated by maximizing the lower bound of logarithm of marginal likelihood using input documents without and with class labels, respectively. The representation capability or classification accuracy is constrained by the variational lower bound and the tied model parameters across inference procedure. This paper aims to relax these constraints by directly maximizing the end performance criterion and continuously untying the parameters in learning process via deep unfolding inference (DUI). The inference procedure is treated as the layer-wise learning in a deep neural network. The end performance is iteratively improved by using the estimated topic parameters according to the exponentiated updates. Deep learning of topic models is therefore implemented through a back-propagation procedure. Experimental results show the merits of DUI with increasing number of layers compared with variational inference in unsupervised as well as supervised topic models.

11.
Med Phys ; 44(12): 6690-6705, 2017 Dec.
Article in English | MEDLINE | ID: mdl-29034482

ABSTRACT

PURPOSE: To investigate deep reinforcement learning (DRL) based on historical treatment plans for developing automated radiation adaptation protocols for nonsmall cell lung cancer (NSCLC) patients that aim to maximize tumor local control at reduced rates of radiation pneumonitis grade 2 (RP2). METHODS: In a retrospective population of 114 NSCLC patients who received radiotherapy, a three-component neural networks framework was developed for deep reinforcement learning (DRL) of dose fractionation adaptation. Large-scale patient characteristics included clinical, genetic, and imaging radiomics features in addition to tumor and lung dosimetric variables. First, a generative adversarial network (GAN) was employed to learn patient population characteristics necessary for DRL training from a relatively limited sample size. Second, a radiotherapy artificial environment (RAE) was reconstructed by a deep neural network (DNN) utilizing both original and synthetic data (by GAN) to estimate the transition probabilities for adaptation of personalized radiotherapy patients' treatment courses. Third, a deep Q-network (DQN) was applied to the RAE for choosing the optimal dose in a response-adapted treatment setting. This multicomponent reinforcement learning approach was benchmarked against real clinical decisions that were applied in an adaptive dose escalation clinical protocol. In which, 34 patients were treated based on avid PET signal in the tumor and constrained by a 17.2% normal tissue complication probability (NTCP) limit for RP2. The uncomplicated cure probability (P+) was used as a baseline reward function in the DRL. RESULTS: Taking our adaptive dose escalation protocol as a blueprint for the proposed DRL (GAN + RAE + DQN) architecture, we obtained an automated dose adaptation estimate for use at ∼2/3 of the way into the radiotherapy treatment course. By letting the DQN component freely control the estimated adaptive dose per fraction (ranging from 1-5 Gy), the DRL automatically favored dose escalation/de-escalation between 1.5 and 3.8 Gy, a range similar to that used in the clinical protocol. The same DQN yielded two patterns of dose escalation for the 34 test patients, but with different reward variants. First, using the baseline P+ reward function, individual adaptive fraction doses of the DQN had similar tendencies to the clinical data with an RMSE = 0.76 Gy; but adaptations suggested by the DQN were generally lower in magnitude (less aggressive). Second, by adjusting the P+ reward function with higher emphasis on mitigating local failure, better matching of doses between the DQN and the clinical protocol was achieved with an RMSE = 0.5 Gy. Moreover, the decisions selected by the DQN seemed to have better concordance with patients eventual outcomes. In comparison, the traditional temporal difference (TD) algorithm for reinforcement learning yielded an RMSE = 3.3 Gy due to numerical instabilities and lack of sufficient learning. CONCLUSION: We demonstrated that automated dose adaptation by DRL is a feasible and a promising approach for achieving similar results to those chosen by clinicians. The process may require customization of the reward function if individual cases were to be considered. However, development of this framework into a fully credible autonomous system for clinical decision support would require further validation on larger multi-institutional datasets.


Subject(s)
Lung Neoplasms/radiotherapy , Neural Networks, Computer , Automation
12.
IEEE Trans Neural Netw Learn Syst ; 27(2): 361-74, 2016 Feb.
Article in English | MEDLINE | ID: mdl-26625430

ABSTRACT

A language model (LM) is calculated as the probability of a word sequence that provides the solution to word prediction for a variety of information systems. A recurrent neural network (RNN) is powerful to learn the large-span dynamics of a word sequence in the continuous space. However, the training of the RNN-LM is an ill-posed problem because of too many parameters from a large dictionary size and a high-dimensional hidden layer. This paper presents a Bayesian approach to regularize the RNN-LM and apply it for continuous speech recognition. We aim to penalize the too complicated RNN-LM by compensating for the uncertainty of the estimated model parameters, which is represented by a Gaussian prior. The objective function in a Bayesian classification network is formed as the regularized cross-entropy error function. The regularized model is constructed not only by calculating the regularized parameters according to the maximum a posteriori criterion but also by estimating the Gaussian hyperparameter by maximizing the marginal likelihood. A rapid approximation to a Hessian matrix is developed to implement the Bayesian RNN-LM (BRNN-LM) by selecting a small set of salient outer-products. The proposed BRNN-LM achieves a sparser model than the RNN-LM. Experiments on different corpora show the robustness of system performance by applying the rapid BRNN-LM under different conditions.

13.
IEEE Trans Neural Netw Learn Syst ; 27(3): 565-78, 2016 Mar.
Article in English | MEDLINE | ID: mdl-25838529

ABSTRACT

Considering the hierarchical data groupings in text corpus, e.g., words, sentences, and documents, we conduct the structural learning and infer the latent themes and topics for sentences and words from a collection of documents, respectively. The relation between themes and topics under different data groupings is explored through an unsupervised procedure without limiting the number of clusters. A tree stick-breaking process is presented to draw theme proportions for different sentences. We build a hierarchical theme and topic model, which flexibly represents the heterogeneous documents using Bayesian nonparametrics. Thematic sentences and topical words are extracted. In the experiments, the proposed method is evaluated to be effective to build semantic tree structure for sentences and the corresponding words. The superiority of using tree model for selection of expressive sentences for document summarization is illustrated.

14.
IEEE Trans Neural Netw Learn Syst ; 24(5): 681-94, 2013 May.
Article in English | MEDLINE | ID: mdl-24808420

ABSTRACT

Independent component analysis (ICA) is a popular approach for blind source separation where the mixing process is assumed to be unchanged with a fixed set of stationary source signals. However, the mixing system and source signals are nonstationary in real-world applications, e.g., the source signals may abruptly appear or disappear, the sources may be replaced by new ones or even moving by time. This paper presents an online learning algorithm for the Gaussian process (GP) and establishes a separation procedure in the presence of nonstationary and temporally correlated mixing coefficients and source signals. In this procedure, we capture the evolved statistics from sequential signals according to online Bayesian learning. The activity of nonstationary sources is reflected by an automatic relevance determination, which is incrementally estimated at each frame and continuously propagated to the next frame. We employ the GP to characterize the temporal structures of time-varying mixing coefficients and source signals. A variational Bayesian inference is developed to approximate the true posterior for estimating the nonstationary ICA parameters and for characterizing the activity of latent sources. The differences between this ICA method and the sequential Monte Carlo ICA are illustrated. In the experiments, the proposed algorithm outperforms the other ICA methods for the separation of audio signals in the presence of different nonstationary scenarios.

15.
IEEE Trans Pattern Anal Mach Intell ; 30(4): 606-16, 2008 Apr.
Article in English | MEDLINE | ID: mdl-18276967

ABSTRACT

This paper presents a hybrid framework of feature extraction and hidden Markov modeling(HMM) for two-dimensional pattern recognition. Importantly, we explore a new discriminative training criterion to assure model compactness and discriminability. This criterion is derived from hypothesis test theory via maximizing the confidence of accepting the hypothesis that observations are from target HMM states rather than competing HMM states. Accordingly, we develop the maximum confidence hidden Markov modeling (MC-HMM) for face recognition. Under this framework, we merge a transformation matrix to extract discriminative facial features. The closed-form solutions to continuous-density HMM parameters are formulated. Attractively, the hybrid MC-HMM parameters are estimated under the same criterion and converged through the expectation-maximization procedure. From the experiments on FERET and GTFD facial databases, we find that the proposed method obtains robust segmentation in presence of different facial expressions, orientations, etc. In comparison with maximum likelihood and minimum classification error HMMs, the proposed MC-HMM achieves higher recognition accuracies with lower feature dimensions.


Subject(s)
Algorithms , Artificial Intelligence , Biometry/methods , Face/anatomy & histology , Image Interpretation, Computer-Assisted/methods , Models, Biological , Pattern Recognition, Automated/methods , Computer Simulation , Humans , Image Enhancement/methods , Likelihood Functions , Markov Chains , Models, Statistical , Reproducibility of Results , Sensitivity and Specificity
SELECTION OF CITATIONS
SEARCH DETAIL
...