Groupe de travail MINERS
Data Mining and Machine Learning Team of LIMOS
Responsable communication :
EL CHEIKH Rim
Membres : FALIH Issam - MEPHU NGUIFO Engelbert - MBOUOPDA Michael Franklin - DJIBEROU MAHAMADOU Abdoul Jali - HOSSAIN Sheikh Imran - Jiarui XIE - Benoit ALBERT - Hélène TRAN - ENNAOUI Karima - YEPMO Véronne - HASSAN Md Shahriar - Jocelyn De GOER -
Nouvelle URL de notre site web : https://miners.limos.fr
Talk: Uncertainty-aware and Interpretable Photometric Astronomical Time Series Classification - Oct. 18, 2021 - SEMINAIRE
Authors: Michael Franklin MBOUOPDA and Engelbert MEPHU NGUIFO
Abstract: Given the large amount of data generated by today's telescopes such as the LSST one, machine learning has become ineluctably necessary to analyze these data efficiently in order to have a better understanding of the universe. The methods used for collecting these data and the conditions in which the measurement is done are such that the data are imprecise and hence have uncertainty. This uncertainty needs to be taken into account when building machine learning models for this data. Furthermore, interpretable models are required by domain experts in order to be trusted, but also for drawing confident conclusions on the analysis. Unlike time series classification (TSC) which has been highly studied during the last decade, the field of uncertain time series classification (uTSC) is still under-explored. The existing works for uTSC are based on the combination of the 1-Nearest Neighbor (1-NN) classifier and an uncertain similarity measure. However, it has been proved recently that this approach is less effective compared to approaches that perform classification regarding local and/or global features extracted from the time series. In this work, we review the existing uncertain similarity measures and propose two novel ones that are based on f-divergences. For the sake of interpretability, we then combine these uncertain measures with the shapelet classification approach in order to classify the PLAsTiCC dataset.
Date: 19 October 2021, at Lyon, France
http://www.madics.fr/actions/bigdata4astro/
Kilichi Party - Sept. 21, 2021 - GENERAL
We enjoyed eating Kilichi today. Kilichi is a dried meat on which different spices have been added. It is mainly found in Niger, Nigeria and Cameroon. Some of us were discovering that meat, for others, it was a great souvenir of what they used to eat.
Check out the Wikipedia Kilichi's page
Measuring consistency for fuzzy logic theories by Prof. Manuel Ojeda-Aciego form University of Malaga - Sept. 11, 2021 - SEMINAIRE
Bio Manuel Ojeda-Aciego
17 September 2021, 10 am
Early diagnosis of Lyme disease by recognizing Erythema Migrans skin lesion from images utilizing deep learning techniques: DSAA 2021 PhD Track - Aug. 18, 2021 - PUBLICATION
Authors: Sk Imran Hossain, Engelbert Mephu Nguifo and Jocelyn de Goër de Herve
Abstract: Lyme disease is one of the most common infectious vector-borne diseases in the world. We extensively studied the effectiveness of convolutional neural networks for identifying Lyme disease from images. Our research plan includes multimodal learning, automation of skin hair mask generation and improving neural architecture search.
Uncertain Time Series Classification - IJCAI 2021 - Aug. 16, 2021 - PUBLICATION
Time series analysis has gained a lot of interest during the last decade with diverse applications in a large range of domains such as medicine, physic, and industry. The field of time series classification has been particularly active recently with the development of more and more efficient methods. However, the existing methods assume that the input time series is free of uncertainty. However, there are applications in which uncertainty is so important that it can not be neglected. This project aims to build efficient, robust, and interpretable classification methods for uncertain time series.
Les académiciens, qu’en pensent-ils ? - Aug. 11, 2021 - GENERAL
Dernière étape de notre parcours, l’Observatoire de la Maturité Data touche à sa fin. Après vous avoir acculturé, donné le retour d’expérience d’un répondant, expliqué ce qu’est la maturité data pour des professionnels, on s’attarde aujourd’hui sur le point de vue de Monsieur Engelbert Mephu Nguifo. Maître de conférence et professeur à l’Université Clermont-Auvergne.
Accepted for Oral presentation at CAP'2021: Scalable and Accurate Subsequence Transform - May 19, 2021 - PUBLICATION
Authors: Michael Franklin MBOUOPDA and Engelbert MEPHU NGUIFO
Abstract : Time series classification using phase-independent subsequences called shapelets is one of the best approaches in the state of the art. This approach is especially characterized by its interpretable property and its fast prediction time. However, given a dataset of n time series of length at most m, learning shapelets requires a computation time of O(n 2 m 4) which is too high for practical datasets. In this paper, we exploit the fact that shapelets are shared by the members of the same class to propose the SAST (Scalable and Accurate Subsequence Transform) algorithm which is interpretable, accurate and more faster than the actual state of the art shapelet algorithm. The experiments we conducted on the UCR archive datasets shown that SAST is more accurate than the state of the art Shapelet Transform algorithm on many datasets, while being significantly more scalable.
Model overview
Paper's link: https://hal.uca.fr/hal-03087686
The 2021 ICML Workshop on Computational Biology - May 19, 2021 - SEMINAIRE
Important Dates
Deadline for submissions : May 25th 2021 (extended from May 22nd 2021)
Reviewer deadline : June 11th 2021
Notification of acceptance : June 14th, 2021
Video recording deadline : June 26th, 2021
Camera-ready deadline : July 16th, 2021
Workshop date : July 24th 2021
The ICML Workshop on Computational Biology (WCB) will highlight how machine learning approaches can be tailored to making both translational and basic scientific discoveries with biological data. Practitioners at the intersection of computation, machine learning, and biology are in a unique position to frame problems in biomedicine, from drug discovery to vaccination risk scores, and the Workshop will showcase such recent research. Commodity lab techniques lead to the proliferation of large complex datasets, and require new methods to interpret these collections of high-dimensional biological data, such as genetic sequences, cellular features or protein structures, and imaging datasets. These data can be used to make new predictions towards clinical response, to uncover new biology, or to aid in drug discovery.
This workshop aims to bring together interdisciplinary machine learning researchers working in areas such as computational genomics; neuroscience; metabolomics; proteomics; bioinformatics; cheminformatics; pathology; radiology; evolutionary biology; population genomics; phenomics; ecology, cancer biology; causality; representation learning and disentanglement to present recent advances and open questions to the machine learning community. We especially encourage interdisciplinary submissions that might not neatly fit into one of these categories.
More info: https://icml-compbio.github.io/#OC
BOOK: Artificial Intelligence What is it, exactly ? - April 28, 2021 - PUBLICATION
Compressed k-Nearest Neighbors Classification for Evolving Data Streams - April 15, 2021 - SEMINAIRE
Journée Perspectives et Défis de l'IA (PDIA) - April 2, 2021 - SEMINAIRE
L’Association Française pour l’Intelligence Artificielle (AFIA) organise sa septième journée PERSPECTIVES ET DEFIS DE l’IA sur le thème de l’EXPLICABILITE.
L’utilisation des systèmes d’apprentissage et d’aide à la décision est devenue courante. L’étude de la fiabilité et de la précision des systèmes concernés est devenue un sujet d’intérêt majeur, et le besoin de comprendre comment de tels systèmes fonctionnent, apprennent ou prennent des décisions est devenu primordial. L’objectif de cette journée est d’étudier et de discuter toutes ces questions, et de rassembler les chercheurs qui s’y intéressent.
La journée est construite autour d’exposés accessibles, de retours d’expériences et de tables rondes favorisant une grande interaction.
Plus d'info ici: https://afia.asso.fr/pdia21/
Photographs from IJCAI 2020 - Yokohama (virtual) - Jan. 18, 2021 - GENERAL
Three papers accepted at the national conference EGC'2020 - Nov. 30, 2020 - PUBLICATION
GPoID : Extraction de Motifs Graduels pour les Bases de Données Imprécises
By: Michael Chirmeni Boujike, Jerry Lonlac, Norbert Tsopze and Engelbert Mephu Nguifo
Apport de l'entropie pour les c-moyennes floues sur des données catégoriques (French version of Fuzz-IEEE'2020)
By: Abdoul Jalil Djiberou Mahamadou, Violaine Antoine, Engelbert Mephu Nguifo and Sylvain Moreno
Ontology-based data integration in a distributed context of coalition air missions
By: Karima Ennaoui, Mathieu Faivre, Md Shahriar Hassan, Christophe Rey, Lauren Dargent, Hervé Girod and Engelbert Mephu Nguifo
Accepted Paper at ICDMW 2020: Uncertain Time Series Classification with Shapelet Transform - Nov. 16, 2020 - PUBLICATION
Authors: Michael F. MBOUOPDA and Engelbert MEPHU NGUIFO
Abstract: Time series classification is a task that aims at classifying chronological data. It is used in a diverse range of domains such as meteorology, medicine and physics. In the last decade, many algorithms have been built to perform this task with very appreciable accuracy. However, applications where time series have uncertainty has been under-explored. Using uncertainty propagation techniques, we propose a new uncertain dissimilarity measure based on Euclidean distance. We then propose the uncertain shapelet transform algorithm for the classification of uncertain time series. The large experiments we conducted on state of the art datasets show the effectiveness of our contribution. The source code of our contribution and the datasets we used are all available on a public repository.
Model overview
A novel algorithm for searching frequent gradual patterns from an ordered data set - Oct. 8, 2020 - PUBLICATION
Accepted Paper at WUML2020 (workshop at ECMLPKDD 2020): Classification of Uncertain Time Series by Propagating Uncertainty in Shapelet Transform - July 24, 2020 - PUBLICATION
Author: Michael F. MBOUOPDA and Engelbert MEPHU NGUIFO
Abstract: Time series classification is a task that aims at classifying chronological data. It is used in a diverse range of domains such as meteorology, medicine and physics. In the last decade, many algorithms have been built to perform this task with very appreciable accuracy. However, the uncertainty in data is not explicitly taken into account by these methods. Using uncertainty propagation techniques, we propose a new uncertain dissimilarity measure based on euclidean distance. We also show how to classify uncertain time series using the proposed dissimilarity measure and shapelet transform, one of the best time series classification methods. An experimental assessment of our contribution is done on the well known UCR dataset.
Accepted Paper at FUZZ-IEEE2020: Categorical fuzzy entropy c-means - May 8, 2020 - PUBLICATION
Authors: Abdoul Jalil Djiberou Mahamadou, Violaine Antoine and Engelbert Mephu Nguifo and Sylvain Moreno
Abstract: Hard and fuzzy clustering algorithms are part of the partition-based clustering family. They are widely used in real-world applications to cluster numerical and categorical data. While in hard clustering an object is assigned to a cluster with certainty, in fuzzy clustering an object can be assigned to different clusters given a membership degree. For both types of method an entropy can be incorporated into the objective function, mostly to avoid solutions raising too much uncertainties. In this paper, we present an extension of a fuzzy clustering method for categorical data using fuzzy centroids. The new algorithm, referred to as Categorical Fuzzy Entropy (CFE), integrates an entropy term in the objective function. This allows a better fuzzification of the cluster prototypes. Experiments on ten real-world data sets and statistical comparisons show that the new method can efficiently handle categorical data.
Acticle accepté à CNIA2020: Classification des Séries Temporelles Incertaines par Transformation Shapelet - May 6, 2020 - PUBLICATION
Auteurs: Michael Franklin MBOUOPDA et Engelbert MEPHU NGUIFO
Résumé: La classification des séries temporelles est une tâche qui consiste à classifier les données chronologiques. Elle est utilisée dans divers domaines tels que la météorologie, la médecine et la physique. Plusieurs techniques performantes ont été proposées durant les dix dernières années pour accomplir cette tâche. Cependant, elles ne prennent pas explicitement en compte l’incertitude dans les données. En utilisant la propagation de l’incertitude, nous proposons une nouvelle mesure de dissimilarité incertaine basée sur la distance euclidienne. Nous montrons également comment faire la classification de séries temporelles incertaines en couplant cette mesure avec la méthode de transformation shapelet, l’une des méthodes les plus performantes pour cette tâche. Une évaluation expérimentale de notre contribution est faite sur le dépôt de données temporelles UCR.
Accepted Paper at FUZZ-IEEE 2019: Evidential clustering for categorical data - May 6, 2020 - PUBLICATION
Author: A. J. Djiberou Mahamadou, V. Antoine, G. J. Christie and S. Moreno
Abstract: Evidential clustering methods assign objects to clusters with a degree of belief, allowing for better representation of cluster overlap and outliers. Based on the theoretical framework of belief functions, they generate credal partitions which extend crisp, fuzzy and possibilistic partitions. Despite their ability to provide rich information about the partition, no evidential clustering algorithm for categorical data has yet been proposed. This paper presents a categorical version of ECM, an evidential variant of k-means. The proposed algorithm, referred to as catECM, considers a new dissimilarity measure and introduces an alternating minimization scheme in order to obtain a credal partition. Experimental results with real and synthetic data sets show the potential and the efficiency of cat-ECM for clustering categorical data.
https://ieeexplore.ieee.org/abstract/document/8858972
NeuroDeRisk - Semi annual meeting - April 28, 2020 - SEMINAIRE
Semi-annual face-to-face meeting of the European project NeuroDeRisk , initially planned in Brussels, but held in web conference because of COVID-19.
A meeting to discuss the last 6 months deliverables and futur ones.
Nouvel article publié dans la revue Pattern Recognition - March 20, 2020 - PUBLICATION
Remise des écharpes docteurs 2020 - Feb. 4, 2020 - SEMINAIRE
Nos nouveaux docteurs en informatique Dr. Angeline PLAUD et Dr. Jocelyn DE GOËR, tous deux encadrés par Prof. Engelbert MEPHU NGUIFO
BIOSS : Groupe de travail sur la biologie systémique symbolique - Dec. 20, 2018 - PUBLICATION
Groupe de travail financé par le GdrIA, Monsieur Engelbert Nguifo Mephu, a réalisé une présentation Mercredi 19 décembre 2018. Cette présentation avait pour thème :A Novel Computational Approach for Global Alignment for Multiple Biological Networks.
mephu_expose_IA_multiformes_2.pdf - Nov. 22, 2018 - PUBLICATION
mephu_expose_IA_multiformes_2.pdf.zip
L’intelligence Artificielle est-elle multiforme ? - Nov. 21, 2018 - PUBLICATION
Présentation effectuée à l'Université Ouverte de Clermont-Ferrand