publications
2025
- Learning subjective time-series data via Utopia Label Distribution ApproximationPattern Recognition, 2025
Subjective time-series regression (STR) tasks have gained increasing attention recently. However, most existing methods overlook the label distribution bias in STR data, which results in biased models. Emerging studies on imbalanced regression tasks, such as age and depth estimations, hypothesize the label distribution is uniform and known. But in reality, the label distribution of test set in STR tasks is usually non-uniform and unknown. Moreover, the time-series data exhibits continuity in both temporal context and label spaces, which has not been addressed by existing methods. To tackle these issues, we propose a Utopia Label Distribution Approximation (ULDA) method, which approximates the training label distribution to the real-world but unknown (utopia) label distribution for calibrating the training and test sets. The utopia label distribution is generated by convolving the original one using a Gaussian kernel. ULDA also has two new devised modules (Time-slice Normal Sampling (TNS) generating required new samples and Convolution Weighted Loss (CWL) lowering learning weights for redundant samples), which not only assist the model training, but also maintain the sample continuity in temporal context space. Extensive experiments demonstrate that ULDA lifts the state-of-the-art performance on STR tasks and shows a considerable generalization ability to other time-series tasks.
2024
- Leveraging Knowledge of Modality Experts for Incomplete Multimodal LearningWenxin Xu* , Hexin Jiang*, and Xuefeng LiangIn Proceedings of the 32nd ACM International Conference on Multimedia (ACM MM), Honourable Mention Award, 2024
Multimodal Emotion Recognition (MER) may encounter incomplete multimodal scenarios caused by sensor damage or privacy protection in practical applications. Existing incomplete multimodal learning methods focus on learning better joint representations across modalities. However, our investigation shows that they are lacking in learning the unimodal representations which are rather discriminative as well. Instead, we propose a novel framework named Mixture of Modality Knowledge Experts (MoMKE) with two-stage training. In unimodal expert training, each expert learns the unimodal knowledge from the corresponding modality. In experts mixing training, both unimodal and joint representations are learned by leveraging the knowledge of all modality experts. In addition, we design a special Soft Router that can enrich the modality representations by dynamically mixing the unimodal representations and the joint representations. Various incomplete multimodal experiments on three benchmark datasets showcase the robust performance of MoMKE, especially on severely incomplete conditions. Visualization analysis further reveals the considerable value of unimodal and joint representations. Codes are realised at https://github.com/wxxv/MoMKE.
2023
- Pairwise-Emotion Data Distribution Smoothing for Emotion RecognitionIn Chinese Conference on Pattern Recognition and Computer Vision (PRCV), 2023
In speech emotion recognition tasks, models learn emotional representations from datasets. We find the data distribution in the IEMOCAP dataset is very imbalanced, which may harm models to learn a better representation. To address this issue, we propose a novel Pairwise-emotion Data Distribution Smoothing (PDDS) method. PDDS considers that the distribution of emotional data should be smooth in reality, then applies Gaussian smoothing to emotion-pairs for constructing a new training set with a smoother distribution. The required new data are complemented using the mixup augmentation. As PDDS is model and modality agnostic, it is evaluated with three state-of-the-art models on two benchmark datasets. The experimental results show that these models are improved by 0.2% 4.8% and 0.1% 5.9% in terms of weighted accuracy and unweighted accuracy. In addition, an ablation study demonstrates that the key advantage of PDDS is the reasonable data distribution rather than a simple data augmentation.