Groupe de travail MINERS

Description :
Data Mining and Machine Learning Team of LIMOS

Responsable communication :

Site web du groupe de travail Miners - Sept. 8, 2023 - GENERAL

Nouvelle URL de notre site web :

Talk: Uncertainty-aware and Interpretable Photometric Astronomical Time Series Classification - Oct. 18, 2021 - SEMINAIRE

Authors: Michael Franklin MBOUOPDA and Engelbert MEPHU NGUIFO

Abstract: Given the large amount of data generated by today's telescopes such as the LSST one, machine learning has become ineluctably necessary to analyze these data efficiently in order to have a better understanding of the universe. The methods used for collecting these data and the conditions in which the measurement is done are such that the data are imprecise and hence have uncertainty. This uncertainty needs to be taken into account when building machine learning models for this data. Furthermore, interpretable models are required by domain experts in order to be trusted, but also for drawing confident conclusions on the analysis. Unlike time series classification (TSC) which has been highly studied during the last decade, the field of uncertain time series classification (uTSC) is still under-explored. The existing works for uTSC are based on the combination of the 1-Nearest Neighbor (1-NN) classifier and an uncertain similarity measure. However, it has been proved recently that this approach is less effective compared to approaches that perform classification regarding local and/or global features extracted from the time series. In this work, we review the existing uncertain similarity measures and propose two novel ones that are based on f-divergences. For the sake of interpretability, we then combine these uncertain measures with the shapelet classification approach in order to classify the PLAsTiCC dataset. 

Date:  19 October 2021, at Lyon, France 

Kilichi Party - Sept. 21, 2021 - GENERAL

We enjoyed eating Kilichi today. Kilichi is a dried meat on which different spices have been added. It is mainly found in Niger, Nigeria and Cameroon. Some of us were discovering that meat, for others, it was a great souvenir of what they used to eat.  

Check out the Wikipedia Kilichi's page

Measuring consistency for fuzzy logic theories by Prof. Manuel Ojeda-Aciego form University of Malaga - Sept. 11, 2021 - SEMINAIRE
17 September 2021, 10 am, ISIMA A102 (Salle du Conseil)
Summary :
Fuzzy logic has shown to be a suitable framework to handle contradictions in which, unsurprisingly, the notion of inconsistency can be defined in different ways. 
This talk starts with a short survey of different ways to define the notion of inconsistency in fuzzy logic systems. As a result, we provide a first notion of inconsistency in terms of the absence of models. Subsequently, we define two measures of consistency that belong purely to the fuzzy paradigm; in the sense that both measures coincide with the crisp notion of consistency when the set of truth values is {0,1}. Accordingly, we can state that the two provided measures of consistence are notions of consistence based on degrees, bringing back the spirit of fuzzy logic into the notion of consistency. 

Bio Manuel Ojeda-Aciego

Manuel Ojeda-Aciego received the M.Sc. in Mathematics in 1990 and the Ph.D. in Computer Science in 1996, both from the University of Málaga, Spain.
He is currently a Full Professor of Applied Mathematics with the University of Málaga. He has (co)authored more than 160 papers in scientific journals and proceedings of international conferences. 
His current research interests include fuzzy answer set semantics, inconsistence, residuated and multiadjoint logic programming, fuzzy formal concept analysis, and algebraic structures for computer science.
Dr. Ojeda-Aciego is the President of the Computer Science Committee of the Royal Spanish Mathematical Society, currently serves the Editorial Board of the journals "Fuzzy Sets and Systems", “Mathematics", and "Intl J on Uncertainty and Fuzziness in Knowledge-Based Systems” and is a member of the Steering Committee of the conferences "Concept Lattices and their Applications", “Intl Conf on Formal Concept Analysis" and "Information Processing and Management of Uncertainty”.

17 September 2021, 10 am


Download the slides

Early diagnosis of Lyme disease by recognizing Erythema Migrans skin lesion from images utilizing deep learning techniques: DSAA 2021 PhD Track - Aug. 18, 2021 - PUBLICATION

Authors: Sk Imran Hossain, Engelbert Mephu Nguifo and Jocelyn de Goër de Herve

Abstract: Lyme disease is one of the most common infectious vector-borne diseases in the world. We extensively studied the effectiveness of convolutional neural networks for identifying Lyme disease from images. Our research plan includes multimodal learning, automation of skin hair mask generation and improving neural architecture search.

Uncertain Time Series Classification - IJCAI 2021 - Aug. 16, 2021 - PUBLICATION

Time series analysis has gained a lot of interest during the last decade with diverse applications in a large range of domains such as medicine, physic, and industry. The field of time series classification has been particularly active recently with the development of more and more efficient methods. However, the existing methods assume that the input time series is free of uncertainty. However, there are applications in which uncertainty is so important that it can not be neglected. This project aims to build efficient, robust, and interpretable classification methods for uncertain time series.

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence
Doctoral Consortium. Pages 4903-4904.

Les académiciens, qu’en pensent-ils ? - Aug. 11, 2021 - GENERAL

Dernière étape de notre parcours, l’Observatoire de la Maturité Data touche à sa fin. Après vous avoir acculturé, donné le retour d’expérience d’un répondant, expliqué ce qu’est la maturité data pour des professionnels, on s’attarde aujourd’hui sur le point de vue de Monsieur Engelbert Mephu Nguifo. Maître de conférence et professeur à l’Université Clermont-Auvergne.  

Lire l'article 

Accepted for Oral presentation at CAP'2021: Scalable and Accurate Subsequence Transform - May 19, 2021 - PUBLICATION

Authors: Michael Franklin MBOUOPDA and Engelbert MEPHU NGUIFO

Abstract : Time series classification using phase-independent subsequences called shapelets is one of the best approaches in the state of the art. This approach is especially characterized by its interpretable property and its fast prediction time. However, given a dataset of n time series of length at most m, learning shapelets requires a computation time of O(n 2 m 4) which is too high for practical datasets. In this paper, we exploit the fact that shapelets are shared by the members of the same class to propose the SAST (Scalable and Accurate Subsequence Transform) algorithm which is interpretable, accurate and more faster than the actual state of the art shapelet algorithm. The experiments we conducted on the UCR archive datasets shown that SAST is more accurate than the state of the art Shapelet Transform algorithm on many datasets, while being significantly more scalable.


Model overview


Paper's link:

The 2021 ICML Workshop on Computational Biology - May 19, 2021 - SEMINAIRE

Important Dates
Deadline for submissions : May 25th 2021 (extended from May 22nd 2021)
Reviewer deadline : June 11th 2021
Notification of acceptance : June 14th, 2021
Video recording deadline : June 26th, 2021
Camera-ready deadline : July 16th, 2021
Workshop date : July 24th 2021

The ICML Workshop on Computational Biology (WCB) will highlight how machine learning approaches can be tailored to making both translational and basic scientific discoveries with biological data. Practitioners at the intersection of computation, machine learning, and biology are in a unique position to frame problems in biomedicine, from drug discovery to vaccination risk scores, and the Workshop will showcase such recent research. Commodity lab techniques lead to the proliferation of large complex datasets, and require new methods to interpret these collections of high-dimensional biological data, such as genetic sequences, cellular features or protein structures, and imaging datasets. These data can be used to make new predictions towards clinical response, to uncover new biology, or to aid in drug discovery.

This workshop aims to bring together interdisciplinary machine learning researchers working in areas such as computational genomics; neuroscience; metabolomics; proteomics; bioinformatics; cheminformatics; pathology; radiology; evolutionary biology; population genomics; phenomics; ecology, cancer biology; causality; representation learning and disentanglement to present recent advances and open questions to the machine learning community. We especially encourage interdisciplinary submissions that might not neatly fit into one of these categories.

More info:

BOOK: Artificial Intelligence What is it, exactly ? - April 28, 2021 - PUBLICATION


Compressed k-Nearest Neighbors Classification for Evolving Data Streams - April 15, 2021 - SEMINAIRE
TitleCompressed k-Nearest Neighbors Classification for Evolving Data Streams
Abstract:  The infinite nature of data streams leads to the inability to store the flow in its entirety and thus restricts the storage to a part of -- and/or synopsis information from -- the stream.  To process these evolving data, we need efficient and accurate methodologies and systems, such as window models and synopsis techniques.  This talk will present a first approach that aims to improve the performance of the stream kNN algorithm using a dimension reduction technique to handle high-dimensional data streams. For further improvements, we incorporate this method into an ensemble classifier, Leveraging Bagging (an online version of Bagging). Theoretical guarantees characterizing the similarity between the kNN neighborhoods before and after the projection are provided.
Speaker:  Maroua Bahri, bio


Journée Perspectives et Défis de l'IA (PDIA) - April 2, 2021 - SEMINAIRE

L’Association Française pour l’Intelligence Artificielle (AFIA) organise sa septième journée PERSPECTIVES ET DEFIS DE l’IA sur le thème de l’EXPLICABILITE.

L’utilisation des systèmes d’apprentissage et d’aide à la décision est devenue courante. L’étude de la fiabilité et de la précision des systèmes concernés est devenue un sujet d’intérêt majeur, et le besoin de comprendre comment de tels systèmes fonctionnent, apprennent ou prennent des décisions est devenu primordial. L’objectif de cette journée est d’étudier et de discuter toutes ces questions, et de rassembler les chercheurs qui s’y intéressent.

La journée est construite autour d’exposés accessibles, de retours d’expériences et de tables rondes favorisant une grande interaction.


Plus d'info ici:

Photographs from IJCAI 2020 - Yokohama (virtual) - Jan. 18, 2021 - GENERAL

 Sit in the garden

Three papers accepted at the national conference EGC'2020 - Nov. 30, 2020 - PUBLICATION

GPoID : Extraction de Motifs Graduels pour les Bases de Données Imprécises

By: Michael Chirmeni Boujike, Jerry Lonlac, Norbert Tsopze and Engelbert Mephu Nguifo 

Apport de l'entropie pour les c-moyennes floues sur des données catégoriques (French version of Fuzz-IEEE'2020)

By: Abdoul Jalil Djiberou Mahamadou, Violaine Antoine, Engelbert Mephu Nguifo and Sylvain Moreno

Ontology-based data integration in a distributed context of coalition air missions

By: Karima Ennaoui, Mathieu Faivre, Md Shahriar Hassan, Christophe Rey, Lauren Dargent, Hervé Girod and Engelbert Mephu Nguifo

Accepted Paper at ICDMW 2020: Uncertain Time Series Classification with Shapelet Transform - Nov. 16, 2020 - PUBLICATION

Authors: Michael F. MBOUOPDA and Engelbert MEPHU NGUIFO

Abstract: Time series classification is a task that aims at classifying chronological data. It is used in a diverse range of domains such as meteorology, medicine and physics. In the last decade, many algorithms have been built to perform this task with very appreciable accuracy. However, applications where time series have uncertainty has been under-explored. Using uncertainty propagation techniques, we propose a new uncertain dissimilarity measure based on Euclidean distance. We then propose the uncertain shapelet transform algorithm for the classification of uncertain time series. The large experiments we conducted on state of the art datasets show the effectiveness of our contribution. The source code of our contribution and the datasets we used are all available on a public repository.


Model overview



A novel algorithm for searching frequent gradual patterns from an ordered data set - Oct. 8, 2020 - PUBLICATION

Accepted Paper at WUML2020 (workshop at ECMLPKDD 2020): Classification of Uncertain Time Series by Propagating Uncertainty in Shapelet Transform - July 24, 2020 - PUBLICATION

Author: Michael F. MBOUOPDA and Engelbert MEPHU NGUIFO

Abstract: Time series classification is a task that aims at classifying chronological data. It is used in a diverse range of domains such as meteorology, medicine and physics. In the last decade, many algorithms have been built to perform this task with very appreciable accuracy. However, the uncertainty in data is not explicitly taken into account by these methods. Using uncertainty propagation techniques, we propose a new uncertain dissimilarity measure based on euclidean distance. We also show how to classify uncertain time series using the proposed dissimilarity measure and shapelet transform, one of the best time series classification methods. An experimental assessment of our contribution is done on the well known UCR dataset.

Accepted Paper at FUZZ-IEEE2020: Categorical fuzzy entropy c-means - May 8, 2020 - PUBLICATION

Authors: Abdoul Jalil Djiberou Mahamadou, Violaine Antoine and Engelbert Mephu Nguifo and Sylvain Moreno

Abstract: Hard and fuzzy clustering algorithms are part of the partition-based clustering family. They are widely used in real-world applications to cluster numerical and categorical data. While in hard clustering an object is assigned to a cluster with certainty, in fuzzy clustering an object can be assigned to different clusters given a membership degree. For both types of method an entropy can be incorporated into the objective function, mostly to avoid solutions raising too much uncertainties. In this paper, we present an extension of a fuzzy clustering method for categorical data using fuzzy centroids. The new algorithm, referred to as Categorical Fuzzy Entropy (CFE), integrates an entropy term in the objective function. This allows a better fuzzification of the cluster prototypes. Experiments on ten real-world data sets and statistical comparisons show that the new method can efficiently handle categorical data.

Acticle accepté à CNIA2020: Classification des Séries Temporelles Incertaines par Transformation Shapelet - May 6, 2020 - PUBLICATION

Auteurs: Michael Franklin MBOUOPDA et Engelbert MEPHU NGUIFO

Résumé: La classification des séries temporelles est une tâche qui consiste à classifier les données chronologiques. Elle est utilisée dans divers domaines tels que la météorologie, la médecine et la physique. Plusieurs techniques performantes ont été proposées durant les dix dernières années pour accomplir cette tâche. Cependant, elles ne prennent pas explicitement en compte l’incertitude dans les données. En utilisant la propagation de l’incertitude, nous proposons une nouvelle mesure de dissimilarité incertaine basée sur la distance euclidienne. Nous montrons également comment faire la classification de séries temporelles incertaines en couplant cette mesure avec la méthode de transformation shapelet, l’une des méthodes les plus performantes pour cette tâche. Une évaluation expérimentale de notre contribution est faite sur le dépôt de données temporelles UCR.

Accepted Paper at FUZZ-IEEE 2019: Evidential clustering for categorical data - May 6, 2020 - PUBLICATION

Author: A. J. Djiberou Mahamadou, V. Antoine, G. J. Christie and S. Moreno

Abstract: Evidential clustering methods assign objects to clusters with a degree of belief, allowing for better representation of cluster overlap and outliers. Based on the theoretical framework of belief functions, they generate credal partitions which extend crisp, fuzzy and possibilistic partitions. Despite their ability to provide rich information about the partition, no evidential clustering algorithm for categorical data has yet been proposed. This paper presents a categorical version of ECM, an evidential variant of k-means. The proposed algorithm, referred to as catECM, considers a new dissimilarity measure and introduces an alternating minimization scheme in order to obtain a credal partition. Experimental results with real and synthetic data sets show the potential and the efficiency of cat-ECM for clustering categorical data.

NeuroDeRisk - Semi annual meeting - April 28, 2020 - SEMINAIRE

Semi-annual face-to-face meeting of the European project NeuroDeRisk , initially planned in Brussels, but held in web conference because of COVID-19.
A meeting to discuss the last 6 months deliverables and futur ones.

Nouvel article publié dans la revue Pattern Recognition - March 20, 2020 - PUBLICATION

Remise des écharpes docteurs 2020 - Feb. 4, 2020 - SEMINAIRE

Nos nouveaux docteurs en informatique Dr. Angeline PLAUD et Dr. Jocelyn DE GOËR, tous deux encadrés par Prof. Engelbert MEPHU NGUIFO

BIOSS : Groupe de travail sur la biologie systémique symbolique - Dec. 20, 2018 - PUBLICATION

Groupe de travail financé par le GdrIA, Monsieur Engelbert Nguifo Mephu, a réalisé une présentation Mercredi 19 décembre 2018. Cette présentation avait pour thème :A Novel Computational Approach for Global Alignment for Multiple Biological Networks.

mephu_expose_IA_multiformes_2.pdf - Nov. 22, 2018 - PUBLICATION

L’intelligence Artificielle est-elle multiforme ? - Nov. 21, 2018 - PUBLICATION

Présentation effectuée à l'Université Ouverte de Clermont-Ferrand