Search | VHL Regional Portal

1.

Rethinking Architecture Design for Tackling Data Heterogeneity in Federated Learning.

Qu, Liangqiong; Zhou, Yuyin; Liang, Paul Pu; Xia, Yingda; Wang, Feifei; Adeli, Ehsan; Fei-Fei, Li; Rubin, Daniel.

Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit ; 2022: 10051-10061, 2022 Jun.

Article in English | MEDLINE | ID: mdl-36624800

ABSTRACT

Federated learning is an emerging research paradigm enabling collaborative training of machine learning models among different organizations while keeping data private at each institution. Despite recent progress, there remain fundamental challenges such as the lack of convergence and the potential for catastrophic forgetting across real-world heterogeneous devices. In this paper, we demonstrate that self-attention-based architectures (e.g., Transformers) are more robust to distribution shifts and hence improve federated learning over heterogeneous data. Concretely, we conduct the first rigorous empirical investigation of different neural architectures across a range of federated algorithms, real-world benchmarks, and heterogeneous data splits. Our experiments show that simply replacing convolutional networks with Transformers can greatly reduce catastrophic forgetting of previous devices, accelerate convergence, and reach a better global model, especially when dealing with heterogeneous data. We release our code and pretrained models to encourage future exploration in robust architectures as an alternative to current research efforts on the optimization front.

2.

MultiBench: Multiscale Benchmarks for Multimodal Representation Learning.

Liang, Paul Pu; Lyu, Yiwei; Fan, Xiang; Wu, Zetian; Cheng, Yun; Wu, Jason; Chen, Leslie; Wu, Peter; Lee, Michelle A; Zhu, Yuke; Salakhutdinov, Ruslan; Morency, Louis-Philippe.

Adv Neural Inf Process Syst ; 2021(DB1): 1-20, 2021 Dec.

Article in English | MEDLINE | ID: mdl-38774625

ABSTRACT

Learning multimodal representations involves integrating information from multiple heterogeneous sources of data. It is a challenging yet crucial area with numerous real-world applications in multimedia, affective computing, robotics, finance, human-computer interaction, and healthcare. Unfortunately, multimodal research has seen limited resources to study (1) generalization across domains and modalities, (2) complexity during training and inference, and (3) robustness to noisy and missing modalities. In order to accelerate progress towards understudied modalities and tasks while ensuring real-world robustness, we release MultiBench, a systematic and unified large-scale benchmark for multimodal learning spanning 15 datasets, 10 modalities, 20 prediction tasks, and 6 research areas. MultiBench provides an automated end-to-end machine learning pipeline that simplifies and standardizes data loading, experimental setup, and model evaluation. To enable holistic evaluation, MultiBench offers a comprehensive methodology to assess (1) generalization, (2) time and space complexity, and (3) modality robustness. MultiBench introduces impactful challenges for future research, including scalability to large-scale multimodal datasets and robustness to realistic imperfections. To accompany this benchmark, we also provide a standardized implementation of 20 core approaches in multimodal learning spanning innovations in fusion paradigms, optimization objectives, and training approaches. Simply applying methods proposed in different research areas can improve the state-of-the-art performance on 9/15 datasets. Therefore, MultiBench presents a milestone in unifying disjoint efforts in multimodal machine learning research and paves the way towards a better understanding of the capabilities and limitations of multimodal models, all the while ensuring ease of use, accessibility, and reproducibility. MultiBench, our standardized implementations, and leaderboards are publicly available, will be regularly updated, and welcomes inputs from the community.

3.

CMU-MOSEAS: A Multimodal Language Dataset for Spanish, Portuguese, German and French.

Zadeh, Amir; Cao, Yan Sheng; Hessner, Simon; Liang, Paul Pu; Poria, Soujanya; Morency, Louis-Philippe.

Proc Conf Empir Methods Nat Lang Process ; 2020: 1801-1812, 2020 Nov.

Article in English | MEDLINE | ID: mdl-33969362

ABSTRACT

Modeling multimodal language is a core research area in natural language processing. While languages such as English have relatively large multimodal language resources, other widely spoken languages across the globe have few or no large-scale datasets in this area. This disproportionately affects native speakers of languages other than English. As a step towards building more equitable and inclusive multimodal systems, we introduce the first large-scale multimodal language dataset for Spanish, Portuguese, German and French. The proposed dataset, called CMU-MOSEAS (CMU Multimodal Opinion Sentiment, Emotions and Attributes), is the largest of its kind with 40, 000 total labelled sentences. It covers a diverse set topics and speakers, and carries supervision of 20 labels including sentiment (and subjectivity), emotions, and attributes. Our evaluations on a state-of-the-art multimodal model demonstrates that CMU-MOSEAS enables further research for multilingual studies in multimodal language.

4.

Effect of reducing sample volume on the detection of drugs in urine by transmission mode direct analysis in real time mass spectrometry.

Liang, Paul; Li, Frederick; Laramee, Brittany; Musselman, Brian.

Rapid Commun Mass Spectrom ; 35 Suppl 2: e8688, 2019 Dec 03.

Article in English | MEDLINE | ID: mdl-31794630

ABSTRACT

RATIONALE: Matrix interference attributed to urea and other nitrogenous substances in unprocessed urine is significant. In this study desorption ionization of sub-microliter volume samples is performed in an effort to improve the detection of drugs in unprocessed urine using transmission mode-direct analysis in real time mass spectrometry (TM-DART-MS). METHODS: Urine samples were spiked with analytical standards of two drugs of abuse, codeine and methadone. Various sub-microliter volumes of unprocessed urine were deposited onto wire mesh screen consumables and analyzed using TM-DART for desorption ionization and a high-resolution mass spectrometer operated in full scan mode for mass analysis. A 22 factorial design of experiment (DOE) was employed to examine the effects of sample volume and sample introduction speed to the DART source. RESULTS: Results from analysis of one microliter and sub-microliter sample volumes were compared by measuring the signal produced by TM-DART-MS. Based on an α of 0.05, the lower-volume samples yielded spectra where the abundance of urea and creatinine ions was reduced, thus significantly improving the TM-DART-MS signal for drugs of abuse. Using slower sample introduction speeds increased the time during which the sample was exposed to the heated ionization gas, resulting in a significant increase in the TM-DART-MS signal. CONCLUSIONS: Reducing the sample volume to sub-microliter levels improved the detection of drugs of abuse present as either individual or multiple components of the untreated urine. The improved signal demonstrates the potential for using sub-microliter volumes for screening drugs in urine without the need for chromatography or sample pretreatment.

5.

High-throughput quantification of drugs of abuse in biofluids via 96-solid-phase microextraction-transmission mode and direct analysis in real time mass spectrometry.

Vasiljevic, Tijana; Gómez-Ríos, Germán Augusto; Li, Frederick; Liang, Paul; Pawliszyn, Janusz.

Rapid Commun Mass Spectrom ; 33(18): 1423-1433, 2019 Sep 30.

Article in English | MEDLINE | ID: mdl-31063263

ABSTRACT

RATIONALE: The workload of clinical laboratories has been steadily increasing over the last few years. High-throughput (HT) sample processing allows scientists to spend more time undertaking matters of critical thinking rather than laborious sample processing. Herein we introduce a HT 96-solid-phase microextraction (SPME) transmission mode (TM) system coupled to direct analysis in real time (DART) mass spectrometry (MS). METHODS: Model compounds (opioids) were extracted from urine and plasma samples using a 96-SPME-TM device. A standard voltage and pressure (SVP) DART source was used for all experiments. Examination of SPME-TM performance was done using high-resolution mass spectrometry (HRMS) in full scan mode (100-500 m/z), whereas quantitation of opioids was performed using triple quadrupole MS in multiple reaction monitoring mode and by using a matrix-matched internal standard correction method. RESULTS: Thirteen points (0.5 to 200 ng mL-1 ) were used to establish a calibration curve. Low limits of quantitation (LOQ) were obtained (0.5 to 25 ng mL-1 ) for matrices used. Acceptable accuracy (71.4-129.4%) and repeatability (1.1-24%) were obtained for validation levels tested (0.5, 30 and 90 ng mL-1 ). In less than 1.5 hours, 96 samples were extracted, desorbed and processed using the 96-SPME-TM system coupled to DART-MS. CONCLUSIONS: A rapid HT method for detection of opioids in urine and plasma samples was developed. This study demonstrated that ambient ionization mass spectrometry coupled to robust sample preparation methods such as SPME-TM can rapidly and efficiently screen/quantify target analytes in a HT context.

Subject(s)

Analgesics, Opioid/blood , Analgesics, Opioid/urine , Mass Spectrometry/methods , Solid Phase Microextraction/instrumentation , Solid Phase Microextraction/methods , Substance Abuse Detection/methods , Calibration , Equipment Design , Humans , Limit of Detection , Sensitivity and Specificity , Substance Abuse Detection/instrumentation

6.

Multimodal Transformer for Unaligned Multimodal Language Sequences.

Tsai, Yao-Hung Hubert; Bai, Shaojie; Pu Liang, Paul; Kolter, J Zico; Morency, Louis-Philippe; Salakhutdinov, Ruslan.

Proc Conf Assoc Comput Linguist Meet ; 2019: 6558-6569, 2019 Jul.

Article in English | MEDLINE | ID: mdl-32362720

ABSTRACT

Human language is often multimodal, which comprehends a mixture of natural language, facial gestures, and acoustic behaviors. However, two major challenges in modeling such multimodal human language time-series data exist: 1) inherent data non-alignment due to variable sampling rates for the sequences from each modality; and 2) long-range dependencies between elements across modalities. In this paper, we introduce the Multimodal Transformer (MulT) to generically address the above issues in an end-to-end manner without explicitly aligning the data. At the heart of our model is the directional pairwise cross-modal attention, which attends to interactions between multimodal sequences across distinct time steps and latently adapt streams from one modality to another. Comprehensive experiments on both aligned and non-aligned multimodal time-series show that our model outperforms state-of-the-art methods by a large margin. In addition, empirical analysis suggests that correlated crossmodal signals are able to be captured by the proposed crossmodal attention mechanism in MulT.

7.

Words Can Shift: Dynamically Adjusting Word Representations Using Nonverbal Behaviors.

Wang, Yansen; Shen, Ying; Liu, Zhun; Liang, Paul Pu; Zadeh, Amir; Morency, Louis-Philippe.

Proc AAAI Conf Artif Intell ; 33(1): 7216-7223, 2019 Jul.

Article in English | MEDLINE | ID: mdl-32219010

ABSTRACT

Humans convey their intentions through the usage of both verbal and nonverbal behaviors during face-to-face communication. Speaker intentions often vary dynamically depending on different nonverbal contexts, such as vocal patterns and facial expressions. As a result, when modeling human language, it is essential to not only consider the literal meaning of the words but also the nonverbal contexts in which these words appear. To better model human language, we first model expressive nonverbal representations by analyzing the fine-grained visual and acoustic patterns that occur during word segments. In addition, we seek to capture the dynamic nature of nonverbal intents by shifting word representations based on the accompanying nonverbal behaviors. To this end, we propose the Recurrent Attended Variation Embedding Network (RAVEN) that models the fine-grained structure of nonverbal subword sequences and dynamically shifts word representations based on nonverbal cues. Our proposed model achieves competitive performance on two publicly available datasets for multimodal sentiment analysis and emotion recognition. We also visualize the shifted word representations in different nonverbal contexts and summarize common patterns regarding multimodal variations of word representations.

8.

Multi-attention Recurrent Network for Human Communication Comprehension.

Zadeh, Amir; Liang, Paul Pu; Poria, Soujanya; Vij, Prateek; Cambria, Erik; Morency, Louis-Philippe.

Proc AAAI Conf Artif Intell ; 2018: 5642-5649, 2018 Feb.

Article in English | MEDLINE | ID: mdl-32257595

ABSTRACT

Human face-to-face communication is a complex multimodal signal. We use words (language modality), gestures (vision modality) and changes in tone (acoustic modality) to convey our intentions. Humans easily process and understand face-to-face communication, however, comprehending this form of communication remains a significant challenge for Artificial Intelligence (AI). AI must understand each modality and the interactions between them that shape the communication. In this paper, we present a novel neural architecture for understanding human communication called the Multi-attention Recurrent Network (MARN). The main strength of our model comes from discovering interactions between modalities through time using a neural component called the Multi-attention Block (MAB) and storing them in the hybrid memory of a recurrent component called the Long-short Term Hybrid Memory (LSTHM). We perform extensive comparisons on six publicly available datasets for multimodal sentiment analysis, speaker trait recognition and emotion recognition. MARN shows state-of-the-art results performance in all the datasets.

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL