Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 11 de 11
Filter
Add more filters










Publication year range
1.
IEEE Trans Pattern Anal Mach Intell ; 46(4): 1981-1995, 2024 Apr.
Article in English | MEDLINE | ID: mdl-37862277

ABSTRACT

This paper considers a network referred to as SoftGroup for accurate and scalable 3D instance segmentation. Existing state-of-the-art methods produce hard semantic predictions followed by grouping instance segmentation results. Unfortunately, errors stemming from hard decisions propagate into the grouping, resulting in poor overlap between predicted instances and ground truth and substantial false positives. To address the abovementioned problems, SoftGroup allows each point to be associated with multiple classes to mitigate the uncertainty stemming from semantic prediction. It also suppresses false positive instances by learning to categorize them as background. Regarding scalability, the existing fast methods require computational time on the order of tens of seconds on large-scale scenes, which is unsatisfactory and far from applicable for real-time. Our finding is that the k-Nearest Neighbor ( k-NN) module, which serves as the prerequisite of grouping, introduces a computational bottleneck. SoftGroup is extended to resolve this computational bottleneck, referred to as SoftGroup++. The proposed SoftGroup++ reduces time complexity with octree k-NN and reduces search space with class-aware pyramid scaling and late devoxelization. Experimental results on various indoor and outdoor datasets demonstrate the efficacy and generality of the proposed SoftGroup and SoftGroup++. Their performances surpass the best-performing baseline by a large margin (6%  âˆ¼  16%) in terms of AP 50. On datasets with large-scale scenes, SoftGroup++ achieves a 6× speed boost on average compared to SoftGroup. Furthermore, SoftGroup can be extended to perform object detection and panoptic segmentation with nontrivial improvements over existing methods.

2.
Adv Mater ; 35(26): e2301627, 2023 Jun.
Article in English | MEDLINE | ID: mdl-36960816

ABSTRACT

Wearable blood-pressure sensors have recently attracted attention as healthcare devices for continuous non-invasive arterial pressure (CNAP) monitoring. However, the accuracy of wearable blood-pressure (BP) monitoring devices has been controversial due to the low signal quality of sensors, the absence of an accurate transfer function to convert the sensor signals into BP values, and the lack of clinical validation regarding measurement precision. Here, a wearable piezoelectric blood-pressure sensor (WPBPS) is reported, which achieves a high normalized sensitivity (0.062 kPa-1 ), and fast response time (23 ms) for CNAP monitoring. The transfer function of a linear regression model is designed, offering a simple solution to convert the flexible piezoelectric sensor signals into BP values. In order to verify the measurement accuracy of WPBPS, clinical trials are performed on 35 subjects aged from 20 to 80 s after screening. The mean difference between the WPBPS and a commercial sphygmomanometer of 175 BP data pairs is -0.89 ± 6.19 and -0.32 ± 5.28 mmHg for systolic blood pressure (SBP) and diastolic blood pressure (DBP), respectively. By building a WPBPS-embedded wristwatch, the potentially promising use of a convenient, portable, continuous BP monitoring system for cardiovascular disease diagnosis is demonstrated.


Subject(s)
Arterial Pressure , Wearable Electronic Devices , Humans , Blood Pressure/physiology , Arterial Pressure/physiology , Blood Pressure Determination , Blood Pressure Monitors
3.
Sensors (Basel) ; 23(4)2023 Feb 08.
Article in English | MEDLINE | ID: mdl-36850523

ABSTRACT

The results obtained in the wafer test process are expressed as a wafer map and contain important information indicating whether each chip on the wafer is functioning normally. The defect patterns shown on the wafer map provide information about the process and equipment in which the defect occurred, but automating pattern classification is difficult to apply to actual manufacturing sites unless processing speed and resource efficiency are supported. The purpose of this study was to classify these defect patterns with a small amount of resources and time. To this end, we explored an efficient convolutional neural network model that can incorporate three properties: (1) state-of-the-art performances, (2) less resource usage, and (3) faster processing time. In this study, we dealt with classifying nine types of frequently found defect patterns: center, donut, edge-location, edge-ring, location, random, scratch, near-full type, and None type using open dataset WM-811K. We compared classification performance, resource usage, and processing time using EfficientNetV2, ShuffleNetV2, MobileNetV2 and MobileNetV3, which are the smallest and latest light-weight convolutional neural network models. As a result, the MobileNetV3-based wafer map pattern classifier uses 7.5 times fewer parameters than ResNet, and the training speed is 7.2 times and the inference speed is 4.9 times faster, while the accuracy is 98% and the F1 score is 89.5%, achieving the same level. Therefore, it can be proved that it can be used as a wafer map classification model without high-performance hardware in an actual manufacturing system.

4.
Sensors (Basel) ; 22(23)2022 Dec 02.
Article in English | MEDLINE | ID: mdl-36502101

ABSTRACT

"A Picture is worth a thousand words". Given an image, humans are able to deduce various cause-and-effect captions of past, current, and future events beyond the image. The task of visual commonsense generation has the aim of generating three cause-and-effect captions for a given image: (1) what needed to happen before, (2) what is the current intent, and (3) what will happen after. However, this task is challenging for machines, owing to two limitations: existing approaches (1) directly utilize conventional vision-language transformers to learn relationships between input modalities and (2) ignore relations among target cause-and-effect captions, but consider each caption independently. Herein, we propose Cause-and-Effect BART (CE-BART), which is based on (1) a structured graph reasoner that captures intra- and inter-modality relationships among visual and textual representations and (2) a cause-and-effect generator that generates cause-and-effect captions by considering the causal relations among inferences. We demonstrate the validity of CE-BART on the VisualCOMET and AVSD benchmarks. CE-BART achieved SOTA performance on both benchmarks, while an extensive ablation study and qualitative analysis demonstrated the performance gain and improved interpretability.


Subject(s)
Language , Learning , Humans
5.
Sensors (Basel) ; 22(17)2022 Aug 24.
Article in English | MEDLINE | ID: mdl-36080822

ABSTRACT

This paper considers a Deep Convolutional Neural Network (DCNN) with an attention mechanism referred to as Dual-Scale Doppler Attention (DSDA) for human identification given a micro-Doppler (MD) signature induced as input. The MD signature includes unique gait characteristics by different sized body parts moving, as arms and legs move rapidly, while the torso moves slowly. Each person is identified based on his/her unique gait characteristic in the MD signature. DSDA provides attention at different time-frequency resolutions to cater to different MD components composed of both fast-varying and steady. Through this, DSDA can capture the unique gait characteristic of each person used for human identification. We demonstrate the validity of DSDA on a recently published benchmark dataset, IDRad. The empirical results show that the proposed DSDA outperforms previous methods, using a qualitative analysis interpretability on MD signatures.


Subject(s)
Forensic Anthropology , Neural Networks, Computer , Female , Gait , Humans , Male , Ultrasonography, Doppler
6.
Sensors (Basel) ; 22(17)2022 Aug 29.
Article in English | MEDLINE | ID: mdl-36080961

ABSTRACT

In an attempt to overcome the limitations of reward-driven representation learning in vision-based reinforcement learning (RL), an unsupervised learning framework referred to as the visual pretraining via contrastive predictive model (VPCPM) is proposed to learn the representations detached from the policy learning. Our method enables the convolutional encoder to perceive the underlying dynamics through a pair of forward and inverse models under the supervision of the contrastive loss, thus resulting in better representations. In experiments with a diverse set of vision control tasks, by initializing the encoders with VPCPM, the performance of state-of-the-art vision-based RL algorithms is significantly boosted, with 44% and 10% improvement for RAD and DrQ at 100 steps, respectively. In comparison to the prior unsupervised methods, the performance of VPCPM matches or outperforms all the baselines. We further demonstrate that the learned representations successfully generalize to the new tasks that share a similar observation and action space.


Subject(s)
Algorithms , Reinforcement, Psychology , Reward
7.
Sensors (Basel) ; 22(14)2022 Jul 14.
Article in English | MEDLINE | ID: mdl-35890949

ABSTRACT

Recent studies have raised concerns regarding racial and gender disparity in facial attribute classification performance. As these attributes are directly and indirectly correlated with the sensitive attribute in a complex manner, simple disparate treatment is ineffective in reducing performance disparity. This paper focuses on achieving counterfactual fairness for facial attribute classification. Each labeled input image is used to generate two synthetic replicas: one under factual assumptions about the sensitive attribute and one under counterfactual. The proposed causal graph-based attribute translation generates realistic counterfactual images that consider the complicated causal relationship among the attributes with an encoder-decoder framework. A causal graph represents complex relationships among the attributes and is used to sample factual and counterfactual facial attributes of the given face image. The encoder-decoder architecture translates the given facial image to have sampled factual or counterfactual attributes while preserving its identity. The attribute classifier is trained for fair prediction with counterfactual regularization between factual and corresponding counterfactual translated images. Extensive experimental results on the CelebA dataset demonstrate the effectiveness and interpretability of the proposed learning method for classifying multiple face attributes.


Subject(s)
Racial Groups , Specimen Handling , Humans
8.
Sci Adv ; 7(7)2021 02.
Article in English | MEDLINE | ID: mdl-33579699

ABSTRACT

Flexible resonant acoustic sensors have attracted substantial attention as an essential component for intuitive human-machine interaction (HMI) in the future voice user interface (VUI). Several researches have been reported by mimicking the basilar membrane but still have dimensional drawback due to limitation of controlling a multifrequency band and broadening resonant spectrum for full-cover phonetic frequencies. Here, highly sensitive piezoelectric mobile acoustic sensor (PMAS) is demonstrated by exploiting an ultrathin membrane for biomimetic frequency band control. Simulation results prove that resonant bandwidth of a piezoelectric film can be broadened by adopting a lead-zirconate-titanate (PZT) membrane on the ultrathin polymer to cover the entire voice spectrum. Machine learning-based biometric authentication is demonstrated by the integrated acoustic sensor module with an algorithm processor and customized Android app. Last, exceptional error rate reduction in speaker identification is achieved by a PMAS module with a small amount of training data, compared to a conventional microelectromechanical system microphone.

9.
Adv Mater ; 32(35): e1904020, 2020 Sep.
Article in English | MEDLINE | ID: mdl-31617274

ABSTRACT

Flexible piezoelectric acoustic sensors have been developed to generate multiple sound signals with high sensitivity, shifting the paradigm of future voice technologies. Speech recognition based on advanced acoustic sensors and optimized machine learning software will play an innovative interface for artificial intelligence (AI) services. Collaboration and novel approaches between both smart sensors and speech algorithms should be attempted to realize a hyperconnected society, which can offer personalized services such as biometric authentication, AI secretaries, and home appliances. Here, representative developments in speech recognition are reviewed in terms of flexible piezoelectric materials, self-powered sensors, machine learning algorithms, and speaker recognition.


Subject(s)
Acoustics/instrumentation , Electricity , Machine Learning , Signal Processing, Computer-Assisted/instrumentation , Speech , Humans , Mechanical Phenomena
10.
IEEE Trans Pattern Anal Mach Intell ; 36(9): 1761-74, 2014 Sep.
Article in English | MEDLINE | ID: mdl-26352230

ABSTRACT

In this paper, a hypergraph-based image segmentation framework is formulated in a supervised manner for many high-level computer vision tasks. To consider short- and long-range dependency among various regions of an image and also to incorporate wider selection of features, a higher-order correlation clustering (HO-CC) is incorporated in the framework. Correlation clustering (CC), which is a graph-partitioning algorithm, was recently shown to be effective in a number of applications such as natural language processing, document clustering, and image segmentation. It derives its partitioning result from a pairwise graph by optimizing a global objective function such that it simultaneously maximizes both intra-cluster similarity and inter-cluster dissimilarity. In the HO-CC, the pairwise graph which is used in the CC is generalized to a hypergraph which can alleviate local boundary ambiguities that can occur in the CC. Fast inference is possible by linear programming relaxation, and effective parameter learning by structured support vector machine is also possible by incorporating a decomposable structured loss function. Experimental results on various data sets show that the proposed HO-CC outperforms other state-of-the-art image segmentation algorithms. The HO-CC framework is therefore an efficient and flexible image segmentation framework.

11.
IEEE Trans Image Process ; 22(2): 488-500, 2013 Feb.
Article in English | MEDLINE | ID: mdl-23008253

ABSTRACT

Image partitioning is an important preprocessing step for many of the state-of-the-art algorithms used for performing high-level computer vision tasks. Typically, partitioning is conducted without regard to the task in hand. We propose a task-specific image partitioning framework to produce a region-based image representation that will lead to a higher task performance than that reached using any task-oblivious partitioning framework and existing supervised partitioning framework, albeit few in number. The proposed method partitions the image by means of correlation clustering, maximizing a linear discriminant function defined over a superpixel graph. The parameters of the discriminant function that define task-specific similarity/dissimilarity among superpixels are estimated based on structured support vector machine (S-SVM) using task-specific training data. The S-SVM learning leads to a better generalization ability while the construction of the superpixel graph used to define the discriminant function allows a rich set of features to be incorporated to improve discriminability and robustness. We evaluate the learned task-aware partitioning algorithms on three benchmark datasets. Results show that task-aware partitioning leads to better labeling performance than the partitioning computed by the state-of-the-art general-purpose and supervised partitioning algorithms. We believe that the task-specific image partitioning paradigm is widely applicable to improving performance in high-level image understanding tasks.

SELECTION OF CITATIONS
SEARCH DETAIL
...