Groupe de travail MINERS

Description :
Data Mining Team of LIMOS

Responsable communication :
MBOUOPDA Michael Franklin

Accepted for Oral presentation at CAP'2021: Scalable and Accurate Subsequence Transform - May 19, 2021 - PUBLICATION

Authors: Michael Franklin MBOUOPDA and Engelbert MEPHU NGUIFO

Abstract : Time series classification using phase-independent subsequences called shapelets is one of the best approaches in the state of the art. This approach is especially characterized by its interpretable property and its fast prediction time. However, given a dataset of n time series of length at most m, learning shapelets requires a computation time of O(n 2 m 4) which is too high for practical datasets. In this paper, we exploit the fact that shapelets are shared by the members of the same class to propose the SAST (Scalable and Accurate Subsequence Transform) algorithm which is interpretable, accurate and more faster than the actual state of the art shapelet algorithm. The experiments we conducted on the UCR archive datasets shown that SAST is more accurate than the state of the art Shapelet Transform algorithm on many datasets, while being significantly more scalable.


Model overview


Paper's link:

The 2021 ICML Workshop on Computational Biology - May 19, 2021 - SEMINAIRE

Important Dates
Deadline for submissions : May 25th 2021 (extended from May 22nd 2021)
Reviewer deadline : June 11th 2021
Notification of acceptance : June 14th, 2021
Video recording deadline : June 26th, 2021
Camera-ready deadline : July 16th, 2021
Workshop date : July 24th 2021

The ICML Workshop on Computational Biology (WCB) will highlight how machine learning approaches can be tailored to making both translational and basic scientific discoveries with biological data. Practitioners at the intersection of computation, machine learning, and biology are in a unique position to frame problems in biomedicine, from drug discovery to vaccination risk scores, and the Workshop will showcase such recent research. Commodity lab techniques lead to the proliferation of large complex datasets, and require new methods to interpret these collections of high-dimensional biological data, such as genetic sequences, cellular features or protein structures, and imaging datasets. These data can be used to make new predictions towards clinical response, to uncover new biology, or to aid in drug discovery.

This workshop aims to bring together interdisciplinary machine learning researchers working in areas such as computational genomics; neuroscience; metabolomics; proteomics; bioinformatics; cheminformatics; pathology; radiology; evolutionary biology; population genomics; phenomics; ecology, cancer biology; causality; representation learning and disentanglement to present recent advances and open questions to the machine learning community. We especially encourage interdisciplinary submissions that might not neatly fit into one of these categories.

BOOK: Artificial Intelligence What is it, exactly ? - April 28, 2021 - PUBLICATION


Compressed k-Nearest Neighbors Classification for Evolving Data Streams - April 15, 2021 - SEMINAIRE
TitleCompressed k-Nearest Neighbors Classification for Evolving Data Streams
Abstract The infinite nature of data streams leads to the inability to store the flow in its entirety and thus restricts the storage to a part of -- and/or synopsis information from -- the stream.  To process these evolving data, we need efficient and accurate methodologies and systems, such as window models and synopsis techniques.  This talk will present a first approach that aims to improve the performance of the stream kNN algorithm using a dimension reduction technique to handle high-dimensional data streams. For further improvements, we incorporate this method into an ensemble classifier, Leveraging Bagging (an online version of Bagging). Theoretical guarantees characterizing the similarity between the kNN neighborhoods before and after the projection are provided.
Speaker:  Maroua Bahri, bio


Journée Perspectives et Défis de l'IA (PDIA) - April 2, 2021 - SEMINAIRE

L’Association Française pour l’Intelligence Artificielle (AFIA) organise sa septième journée PERSPECTIVES ET DEFIS DE l’IA sur le thème de l’EXPLICABILITE.

L’utilisation des systèmes d’apprentissage et d’aide à la décision est devenue courante. L’étude de la fiabilité et de la précision des systèmes concernés est devenue un sujet d’intérêt majeur, et le besoin de comprendre comment de tels systèmes fonctionnent, apprennent ou prennent des décisions est devenu primordial. L’objectif de cette journée est d’étudier et de discuter toutes ces questions, et de rassembler les chercheurs qui s’y intéressent.

La journée est construite autour d’exposés accessibles, de retours d’expériences et de tables rondes favorisant une grande interaction.


Photographs from IJCAI 2020 - Yokohama (virtual) - Jan. 18, 2021 - GENERAL

Three papers accepted at the national conference EGC'2020 - Nov. 30, 2020 - PUBLICATION

GPoID : Extraction de Motifs Graduels pour les Bases de Données Imprécises

By: Michael Chirmeni Boujike, Jerry Lonlac, Norbert Tsopze and Engelbert Mephu Nguifo 

Apport de l'entropie pour les c-moyennes floues sur des données catégoriques (French version of Fuzz-IEEE'2020)

By: Abdoul Jalil Djiberou Mahamadou, Violaine Antoine, Engelbert Mephu Nguifo and Sylvain Moreno

Ontology-based data integration in a distributed context of coalition air missions

By: Karima Ennaoui, Mathieu Faivre, Md Shahriar Hassan, Christophe Rey, Lauren Dargent, Hervé Girod and Engelbert Mephu Nguifo

Accepted Paper at ICDMW 2020: Uncertain Time Series Classification with Shapelet Transform - Nov. 16, 2020 - PUBLICATION

Authors: Michael F. MBOUOPDA and Engelbert MEPHU NGUIFO

Abstract: Time series classification is a task that aims at classifying chronological data. It is used in a diverse range of domains such as meteorology, medicine and physics. In the last decade, many algorithms have been built to perform this task with very appreciable accuracy. However, applications where time series have uncertainty has been under-explored. Using uncertainty propagation techniques, we propose a new uncertain dissimilarity measure based on Euclidean distance. We then propose the uncertain shapelet transform algorithm for the classification of uncertain time series. The large experiments we conducted on state of the art datasets show the effectiveness of our contribution. The source code of our contribution and the datasets we used are all available on a public repository.


Model overview



A novel algorithm for searching frequent gradual patterns from an ordered data set - Oct. 8, 2020 - PUBLICATION

Accepted Paper at WUML2020 (workshop at ECMLPKDD 2020): Classification of Uncertain Time Series by Propagating Uncertainty in Shapelet Transform - July 24, 2020 - PUBLICATION

Author: Michael F. MBOUOPDA and Engelbert MEPHU NGUIFO

Abstract: Time series classification is a task that aims at classifying chronological data. It is used in a diverse range of domains such as meteorology, medicine and physics. In the last decade, many algorithms have been built to perform this task with very appreciable accuracy. However, the uncertainty in data is not explicitly taken into account by these methods. Using uncertainty propagation techniques, we propose a new uncertain dissimilarity measure based on euclidean distance. We also show how to classify uncertain time series using the proposed dissimilarity measure and shapelet transform, one of the best time series classification methods. An experimental assessment of our contribution is done on the well known UCR dataset.

Accepted Paper at FUZZ-IEEE2020: Categorical fuzzy entropy c-means - May 8, 2020 - PUBLICATION

Authors: Abdoul Jalil Djiberou Mahamadou, Violaine Antoine and Engelbert Mephu Nguifo and Sylvain Moreno

Abstract: Hard and fuzzy clustering algorithms are part of the partition-based clustering family. They are widely used in real-world applications to cluster numerical and categorical data. While in hard clustering an object is assigned to a cluster with certainty, in fuzzy clustering an object can be assigned to different clusters given a membership degree. For both types of method an entropy can be incorporated into the objective function, mostly to avoid solutions raising too much uncertainties. In this paper, we present an extension of a fuzzy clustering method for categorical data using fuzzy centroids. The new algorithm, referred to as Categorical Fuzzy Entropy (CFE), integrates an entropy term in the objective function. This allows a better fuzzification of the cluster prototypes. Experiments on ten real-world data sets and statistical comparisons show that the new method can efficiently handle categorical data.

Acticle accepté à CNIA2020: Classification des Séries Temporelles Incertaines par Transformation Shapelet - May 6, 2020 - PUBLICATION

Auteurs: Michael Franklin MBOUOPDA et Engelbert MEPHU NGUIFO

Résumé: La classification des séries temporelles est une tâche qui consiste à classifier les données chronologiques. Elle est utilisée dans divers domaines tels que la météorologie, la médecine et la physique. Plusieurs techniques performantes ont été proposées durant les dix dernières années pour accomplir cette tâche. Cependant, elles ne prennent pas explicitement en compte l’incertitude dans les données. En utilisant la propagation de l’incertitude, nous proposons une nouvelle mesure de dissimilarité incertaine basée sur la distance euclidienne. Nous montrons également comment faire la classification de séries temporelles incertaines en couplant cette mesure avec la méthode de transformation shapelet, l’une des méthodes les plus performantes pour cette tâche. Une évaluation expérimentale de notre contribution est faite sur le dépôt de données temporelles UCR.

Accepted Paper at FUZZ-IEEE 2019: Evidential clustering for categorical data - May 6, 2020 - PUBLICATION

Author: A. J. Djiberou Mahamadou, V. Antoine, G. J. Christie and S. Moreno

Abstract: Evidential clustering methods assign objects to clusters with a degree of belief, allowing for better representation of cluster overlap and outliers. Based on the theoretical framework of belief functions, they generate credal partitions which extend crisp, fuzzy and possibilistic partitions. Despite their ability to provide rich information about the partition, no evidential clustering algorithm for categorical data has yet been proposed. This paper presents a categorical version of ECM, an evidential variant of k-means. The proposed algorithm, referred to as catECM, considers a new dissimilarity measure and introduces an alternating minimization scheme in order to obtain a credal partition. Experimental results with real and synthetic data sets show the potential and the efficiency of cat-ECM for clustering categorical data.

NeuroDeRisk - Semi annual meeting - April 28, 2020 - SEMINAIRE

Semi-annual face-to-face meeting of the European project NeuroDeRisk , initially planned in Brussels, but held in web conference because of COVID-19.
A meeting to discuss the last 6 months deliverables and futur ones.

Nouvel article publié dans la revue Pattern Recognition - March 20, 2020 - PUBLICATION

Remise des écharpes docteurs 2020 - Feb. 4, 2020 - SEMINAIRE

Nos nouveaux docteurs en informatique Dr. Angeline PLAUD et Dr. Jocelyn DE GOËR, tous deux encadrés par Prof. Engelbert MEPHU NGUIFO

BIOSS : Groupe de travail sur la biologie systémique symbolique - Dec. 20, 2018 - PUBLICATION

Groupe de travail financé par le GdrIA, Monsieur Engelbert Nguifo Mephu, a réalisé une présentation Mercredi 19 décembre 2018. Cette présentation avait pour thème :A Novel Computational Approach for Global Alignment for Multiple Biological Networks.

mephu_expose_IA_multiformes_2.pdf - Nov. 22, 2018 - PUBLICATION

L’intelligence Artificielle est-elle multiforme ? - Nov. 21, 2018 - PUBLICATION

Présentation effectuée à l'Université Ouverte de Clermont-Ferrand