
Domain adaptation arises as an important problem in statistical learning theory when the data-generating processes differ between training and test samples, respectively called source and target domains. Recent theoretical advances show that the success of domain adaptation algorithms heavily relies on their ability to minimize the divergence between the probability distributions of the source and target domains. However, minimizing this divergence cannot be done independently of the minimization of other key ingredients such as the source risk or the combined error of the ideal joint hypothesis. The trade-off between these terms is often ensured by algorithmic solutions that remain implicit and not directly reflected by the theoretical guarantees. To get to the bottom of this issue, we propose in this paper a new theoretical framework for domain adaptation through hierarchical optimal transport. This framework provides more explicit generalization bounds and allows us to consider the natural hierarchical organization of samples in both domains into classes or clusters. Additionally, we provide a new divergence measure between the source and target domains called Hierarchical Wasserstein distance that indicates under mild assumptions, which structures have to be aligned to lead to a successful adaptation.
Human Activity Recognition (HAR) is a field of study that focuses on identifying and classifying human activities. Skeleton-based Human Activity Recognition has received much attention in recent years, where Graph Convolutional Network (GCN) based method is widely used and has achieved remarkable results. However, the representation of skeleton data and the issue of over-smoothing in GCN still need to be studied. 1). Compared to central nodes, edge nodes can only aggregate limited neighbor information, and different edge nodes of the human body are always structurally related. However, the information from edge nodes is crucial for fine-grained activity recognition. 2). The Graph Convolutional Network suffers from a significant over- smoothing issue, causing nodes to become increasingly similar as the number of network layers increases. Based on these two ideas, we propose a two-stream graph convolution method called Spatial-Structural GCN (SpSt-GCN). Spatial GCN performs information aggregation based on the topological structure of the human body, and structural GCN performs differentiation based on the similarity of edge node sequences. The spatial connection is fixed, and the human skeleton naturally maintains this topology regardless of the actions performed by humans. However, the structural connection is dynamic and depends on the type of movement the human body is performing. Based on this idea, we also propose an entirely data-driven structural connection, which greatly increases flexibility. We evaluate our method on two large-scale datasets, i.e., NTU RGB+D and NTU RGB+D 120. The proposed method achieves good results while being efficient
Domain adaptation is a subfield of statistical learning theory that takes into account the shift between the distribution of training and test data, typically known as source and target domains, respectively. In this context, this paper presents an incremental approach to tackle the intricate challenge of unsupervised domain adaptation, where labeled data within the target domain is unavailable. The proposed approach, OTP-DA, endeavors to learn a sequence of joint subspaces from both the source and target domains using Linear Discriminant Analysis (LDA), such that the projected data into these subspaces are domain-invariant and well-separated. Nonetheless, the necessity of labeled data for LDA to derive the projection matrix presents a substantial impediment, given the absence of labels within the target domain in the setting of unsupervised domain adaptation. To circumvent this limitation, we introduce a selective label propagation technique grounded on optimal transport (OTP), to generate pseudo-labels for target data, which serve as surrogates for the unknown labels. We anticipate that the process of inferring labels for target data will be substantially streamlined within the acquired latent subspaces, thereby facilitating a self-training mechanism. Furthermore, our paper provides a rigorous theoretical analysis of OTP-DA, underpinned by the concept of weak domain adaptation learners, thereby elucidating the requisite conditions for the proposed approach to solve the problem of unsupervised domain adaptation efficiently. Experimentation across a spectrum of visual domain adaptation problems suggests that OTP-DA exhibits promising efficacy and robustness, positioning it favorably compared to several state-of-the-art methods.
As machine learning models gain traction in real world applications, user demand for transparent results grows. The field of explainability (XAI) is meeting this challenge with remarkable speed and efficiency. Notable examples include SHAP and LIME, which are feature-based XAI methods. In this work we aim to review a distinct category of XAI approaches, whose support for providing explanations is interpretable explanatory elements representing user knowledge, instead of raw input features. We categorize these methods based on the stage at which the knowledge is integrated to the XAI pipeline. Furthermore, we highlight the literature around the assessment of XAI methods. We emphasize the importance of the metric of faithfulness of knowledge-based explanations, not only to the real world but also to the underlying model.
Recent advancements in machine learning have highlighted the importance of integrating different data sources to improve classification model performance. By utilizing multiple data representations, a richer understanding of subjects or objects can be achieved. For instance, in emotion recognition field combining multiple sources and/or modalities of information (eg, voice, text, facial expression, body posture) performs well than those relying solely on a single modality. The challenge lies in fusing distinct types of data such as image, text, audio or video that are not naturally aligned.
La recherche en reconnaissance automatique des émotions est active depuis de nombreuses décennies et ses applications sont diverses, telles que la santé et le divertissement. Notre étude porte sur les méthodes qui prédisent les émotions à partir d’images d’expressions faciales (FER). Trois modèles d’état de l’art pour les tâches FER ont été sélectionnés pour être expérimentés. Ils divergent dans leurs architectures et la méthode utilisée pour améliorer la qualité de l’inférence des émotions. Nos expériences fournissent une comparaison équitable de leurs performances sur trois ensembles de données qui diffèrent en termes de taille, de méthode de collecte d’images et de distribution des classes.
As machine learning models gain traction in real world applications, user demand for transparent results grows. The field of explainability (XAI) is meeting this challenge with remarkable speed and efficiency. Notable examples include SHAP and LIME, which are feature-based XAI methods. In this work we aim to review a distinct category of XAI approaches, whose support for providing explanations is interpretable explanatory elements representing user knowledge, instead of raw input features. We categorize these methods based on the stage at which the knowledge is integrated to the XAI pipeline. Furthermore, we highlight the literature around the assessment of XAI methods. We emphasize the importance of the metric of faithfulness of knowledge-based explanations, not only to the real world but also to the underlying model.
Non-negative matrix factorization (NMF) is an unsupervised algorithm for clustering where a non-negative data matrix is factorized into (usually) two matrices with the property that all the matrices have no negative elements. This factorization raises the problem of instability, which means whenever we run NMF for the same dataset, we get different factorization. In order to solve the problem of non-uniqueness and to have a more stable solution, we propose a new approach that consists on collaborating different NMF models followed by a consensus. The proposed approach was validated on several datasets and the experimental results showed the effectiveness of our approach which is based on the reducing of standard reconstruction error in NMF model.
In this paper, we propose a novel approach for unsupervised domain adaptation that relates notions of optimal transport, learning probability measures, and unsupervised learning. The proposed approach, HOT-DA, is based on a hierarchical formulation of optimal transport that leverages beyond the geometrical information captured by the ground metric, richer structural information in the source and target domains. The additional information in the labeled source domain is formed instinctively by grouping samples into structures according to their class labels. While exploring hidden structures in the unlabeled target domain is reduced to the problem of learning probability measures through Wasserstein barycenter, which we prove to be equivalent to spectral clustering. Experiments show the superiority of the proposed approach over state-of-the-art across a range of domain adaptation problems including inter-twinning moons dataset, Digits, Office-Caltech, and Office-Home. Experiments also show the robustness of our model against structure imbalance. We make our code publicly available.
La reconnaissance des émotions est une brique fondamentale dans l’octroi de l’intelligence émotionnelle aux machines. Les premiers modèles ont été conçus pour reconnaître les émotions fortement exprimées et facilement identifiables. Cependant, nous sommes rarement en proie à ce type d’émotions dans notre vie quotidienne. La plupart du temps, nous éprouvons une difficulté à identifier avec certitude notre propre émotion et celle d’autrui: c’est l’ambiguïté émotionnelle. Les bases de données, à la racine du développement des systèmes de reconnaissance, doivent permettre d’introduire l’ambiguïté dans la représentation émotionnelle. Ce papier résume les principales représentations émotionnelles et propose un état de l’art des bases de données multimodales pour la reconnaissance des émotions, avec une étude de leur positionnement sur la problématique. Le papier poursuit sur une discussion de la possibilité de représenter l’ambiguïté des émotions à partir des bases de données sélectionnées.
In this paper, we address the problem of unsupervised domain adaptation where we ask to infer a low target risk classifier, while labeled data are only available from the source domain. Our proposed approach, called DA-OTP, aims to learn a gradual subspace alignment of the source and target domains through Supervised Locality Preserving Projection, so that projected data in the joint low-dimensional latent subspace can be domain-invariant and easily separable. However, this objective can be rather challenging to achieve because of the absence of labeled data in the target domain. To overcome this conundrum, we use an incremental label propagation technique based on optimal transport, which performs selective pseudo-labeling in the target domain. The selected pseudo-labeled target samples are then combined with labeled source samples to learn in a self-training fashion a robust classifier after the incremental subspace alignment. Experiments show the competitiveness of the proposed approach across contemporary state-of-the-art methods over a range of domain adaptation problems. We make our code publicly available.
Most databases used for emotion recognition assign a single emotion to data samples. This does not match with the complex nature of emotions: we can feel a wide range of emotions throughout our lives with varying degrees of intensity. We may even experience multiple emotions at once. Furthermore, each person physically expresses emotions differently, which makes emotion recognition even more challenging: we call this emotional ambiguity. This paper investigates the problem as a review of ambiguity in multimodal emotion recognition models. To lay the groundwork, the main representations of emotions along with solutions for incorporating ambiguity are described, followed by a brief overview of ambiguity representation in multimodal databases. Thereafter, only models trained on a database that incorporates ambiguity have been studied in this paper. We conclude that although databases provide annotations with ambiguity, most of these models do not fully exploit them, showing that there is still room for improvement in multimodal emotion recognition systems.
Multiplex network model has been recently proposed as a mean to capture high level complexity in real-world interaction networks. This model, in spite of its simplicity, allows handling multi-relationnal, heterogeneous, dynamic and even attributed networks. However, it requiers redefining and adapting almost all basic metrics and algorithms generally used to analyse complex networks. In this work we present MUNA: a MUltiplex Network Analysis library that we have developed in both R and Python on top of igraph network analysis package. In its current version, MUNA provides primitives to build, edit and modify multiplex networks. It also provides a bunch of functions computing basic metrics on multiplex networks. However, the most interesting functionality provided by MUNA is probably the wide variety of available community detection algorithms. Actually, the library implements different approaches for community detection including: partition aggregation approaches, layer aggregation approaches and direct multiplex approaches such as the GenLouvain and MuxLicod algorithms. It also offers an extended list of multiplex community evaluation indexes.
Recommendation systems provide the facility to understand a person’s taste and find new, desirable content for them based on aggregation between their likes and rating of different items. In this paper, we propose a recommendation system that predict the note given by a user to an item. This recommendation system is mainly based on unsupervised topological learning. The proposed approach has been validated on MovieLens dataset and the obtained results have show very promising performances
Graph clustering techniques are very useful for detecting densely connected groups in large graphs. Many existing graph clustering methods mainly focus on the topological structure, but ignore the vertex properties. Existing graph clustering methods have been recently extended to deal with nodes attribute. In this paper we propose a new method which uses the nodes attributes information along with the topological structure of the network in the clustering process. Experimental results demonstrate the effectiveness of the proposed method through comparisons with the state-of-the-art graph clustering methods.
Graph clustering techniques are very useful for detecting densely connected groups in large graphs. Many existing graph clustering methods mainly focus on the topological structure, but ignore the vertex properties. Existing graph clustering methods have been recently extended to deal with nodes attribute. In this paper we propose a new method which uses the nodes attributes information along with the topological structure of the network in the clustering process. In order to use the information about the attributes nodes, the collaborative clustering can be employed in the model. The aim of collaborative clustering is to reveal the common underlying structure of data spread across multiple sites by applying different clustering algorithms and therefore improve the final clustering result. The purpose of this article is to introduce a new attributed collaborative multi-view networks based on community detection in networks and topological collaborative learning. The idea consists in modifying databases by adding virtual points which convey clustering information, to change the position of centers of the clustering solution. Experimental results demonstrate the effectiveness of the proposed method through comparisons with the state-of-the-art graph clustering methods on synthetic and real datasets.
Collaborative filtering is a well-known technique for recommender systems. Collaborative filtering models use the available preferences of a group of users to make recommendations or predictions of the unknown preferences for other users. Collaborative filtering suffers from the data sparsity problem when users only rate a small set of items which makes the computation of users similarity imprecise and reduce consequently the accuracy of the recommended items. Clustering techniques include multiplex network clustering can be used to deal with this problem. In this paper, we propose a collaborative filtering system based on clustering multiplex network that predict the rate value that a user would give to an item. This approach looks, in a first step, for users having the same behavior or sharing the same characteristics. Then, use the ratings from those similar users found in the first step to predict other ratings. The proposed approach has been validated on MovieLens dataset and the obtained results have shown very promising performances.
This paper introduces a new topological clustering approach to cluster high dimensional datasets based on t-SNE (Stochastic Neighbor Embedding) dimensionality reduction method and Self-Organizing Maps (SOM). The unsupervised learning is often used for clustering data and rarely as a data preprocessing method. However, there are many methods that produce new data representations from unlabeled data. These unsupervised methods are sometimes used as a preprocessing tool for supervised or unsupervised learning models. The t-SNE method which performs good results for visulaization allows a projection of the dataset in low dimensional spaces that make it easy to use for very large datasets. Using t-SNE during the learning process will allow to reduce the dimensionality and to preserve the topology of the dataset by increasing the clustering accuracy. We illustrate the power of this method with several real datasets.
https://hal.science/hal-03274095v1/file/CIFSD2021_actes.pdf#page=24
In this paper, we tackle the inductive semi-supervised learning problem that aims to obtain label predictions for out-of-sample data. The proposed approach, called Optimal Transport Induction (OTI), extends efficiently an optimal transport based transductive algorithm (OTP) to inductive tasks for both binary and multi-class settings. A series of experiments are conducted on several datasets in order to compare the proposed approach with state-of-the-art methods. Experiments demonstrate the effectiveness of our approach. We make our code publicly available