Phd seminars - Archives

Planning - Presentation - Archives

Knowledge Representation for Ecosystems of Digital Twins

April 3, 2024, 3 p.m. - Salle A104

Digital Twins are an ever-rising topic in Computer Science as well as several other domains of application. The idea to be able to create virtual replicas of the objects that we wish to keep track of in the real world is fascinating and appealing for several reasons including monitoring and forecasting through predictions and simulations. We can soon imagine a world where several Digital Twins of different objects are available, but what happens then? How can we make them interoperable and useful for other applications? In my staying at École des Mines, I tried to understand how Knowledge Representation can make it happen, and what is the current gap in effectively describing Digital Twins for this purpose.

Data Quality Evaluation for Classification Tasks

March 6, 2024, 3 p.m. - Visio-conférence

Data quality is a crucial element for building and optimizing good learning models. Despite many attempts to characterize data quality, rigorous formalization and an efficient quality measure from available observations are still lacking, which is reflected in available data quality tools. Indeed, without a clear understanding of the training and testing processes, it is hard to evaluate the intrinsic performance of a model. This presentation introduces and explains a novel metric to measure data quality. This metric is based on the correlated evolution between the classification performance and the deterioration of data. The proposed method has the significant advantage of being model-independent and does not require metadata or expert knowledge. Furthermore, we provide an interpretation of our metric. We confirm the utility of the proposed metric with intensive numerical experiments and detail some illustrative cases with controlled and interpretable qualities.

Introduction to knowledge graph embedding models

Feb. 7, 2024, 3 p.m. - Visio-conférence

Knowledge Graph Embeddings (KGEs) have become a powerful tool for learning graph representations. They allow entities and relations to be encoded in directed edge-labelled graph (e.g., RDF). We will examine the underlying design principles of KGE models, exploring why they are gaining popularity for graph representation. We will discuss the theoretical underpinnings of KGEs: how they represent entities and relations as vectors in a vector space. We will focus on their current practical applications, such as graph completion, link classification, and so on. We will also discuss the current limitations of KGEs: difficulties in representing certain types of relationships, scaling, integration of external knowledge. Finally, we will present future research directions to overcome these limitations, such as new score functions, knowledge injection and unsupervised learning.

L’information géographique volontaire au service de l’accessibilité

April 19, 2022, 5:45 p.m. - Salle du conseil

no desc

Heterogeneous wireless networks coexistence techniques

March 16, 2022, 5:30 p.m. -

Application physique de preuves à divulgation nulle de connaissance et preuve de np-completeness.

Jan. 25, 2022, 5:30 p.m. - Visio-conférence
Léo Robert

Physical application to zero-knowledge proof protocol and np-completeness.
Abstract: Suguru is a pen-and-paper puzzle which resembles to sudoku; you need to fill a grid with numbers following some rules. How can you convince that you have the solution without revealing any information about the solution?
Firstly, we will see a protocol using day-to-day objects (such as playing cards) to prove that a party has the solution without revealing any piece of information about that solution.
Secondly, we will see a proof about the hardness of solving a suguru instance.

Bayesian Inference Approach For an Efficient Data Sharing among IoT devices

May 19, 2018, 3 p.m. - Salle du conseil
Cristanel RAZAFIMANDIMBY (Inria Lille) 
Nowadays, Internet of Things (IoT) coupled with cloud computing begins to take an important place in economic systems and in society daily life. It has got a large success in several application areas, ranging from smart city applications to smart grids. One major challenge that should be addressed is the huge amount of data generated by the sensing devices, which make the control of sending useless data very important. To face this challenge, we present a Bayesian Inference Approach (BIA), which allows avoiding the transmission of high spatio-temporal correlated data. The proposed BIA is based on a hierarchical architecture with simple nodes, smart gateways and data centers. Belief Propagation algorithm has been chosen for performing an approximate inference on our model in order to reconstruct the missing sensing data. BIA is evaluated based on the data collected from real sensors and according to different scenarios. The results show that our proposed approach reduces drastically the number of transmitted data and the energy consumption, while maintaining an acceptable level of data prediction accuracy.|

Cycle de poids minimum : Polytopes et algorithmes dans les graphes série-parallèles

May 19, 2018, 2:30 p.m. - Salle du conseil
Henri Perret du Cray (Limos)

Un cycle simple est un graphe connexe où tous les sommets ont degré 2. Résoudre le problème du cycle de poids minimum consiste à trouver un cycle simple minimisant une fonction poids sur les arêtes d’un graphe G=(V,E). Je présenterai une formulation linéaire compacte du problème, et montrerai que cette formulation décrit le polytope des cycles lorsque le graphe considéré et série-parallèle. Je présenterai également un algorithme linéaire pour résoudre le problème du cycle de poids minimum dans les graphes séries parallèles, ainsi qu’une variante du problème où le cycle recherché doit avoir une taille fixée.|

An introduction on RPL : IPv6 Routing Protocol for Low-Power and Lossy Networks

March 24, 2017, 3 p.m. - Salle du conseil
Jinpeng Wang

In the recent years, Low Power and Lossy Networks (LLNs) have become one of the most interesting research areas. They typically operate with strict resource constraints in communication, computation, memory, and energy. IP smart object networks are undoubtedly one of the key components of the next wave of the Internet, with an endless number of new opportunities and applications according to newly designed IP based protocols. Routing Protocol for Low-Power and Lossy Networks (RPL) is an ad-hoc on-Demand Distance Vector routing protocol for LLNs that optimizes IPv6 in order to run on over of IEEE 802.15.4 standard, which has an acronym 6LoWPAN. It specifies how to construct Destination Oriented Directed Acyclic Graphs (DODAG) by using an objective function through a set of metrics. In this study, we mainly make an introduction on the mechanism and the most influential parameters of RPL.

Combining CP and ILP in a tree decomposition of bounded height for the sum colouring problem

March 24, 2017, 2:30 p.m. - Salle du conseil
Maël Minot 
Le problème de somme coloration est un problème NP-difficile dérivé du problème bien connu de coloration de graphe. Il consiste à trouver une coloration qui minimise la somme d’entiers correspondant aux couleurs affectées plutôt que le nombre de couleurs utilisées. Ce problème survient notamment dans les domaines de la planification et de l’allocation de ressources. Nous avons mené une évaluation des capacités qu’ont la programmation linéaire (ILP) et la programmation par contraintes (CP) pour résoudre ce problème, et proposé plusieurs améliorations. Nous explorons également les possibilités offertes par une combinaison d’ILP et de CP dans une décomposition arborescente de hauteur bornée, et évaluons des méthodes visant à obtenir rapidement une solution de qualité par combinaison de solutions partielles à partir d’une telle décomposition. Enfin, certaines de ces méthodes peuvent être combinées au sein d’un portfolio visant à tirer partie de leur importante complémentarité.

Fast loop-free transition of routing protocols

Feb. 10, 2017, 3 p.m. - Salle du conseil
Nina Pelagie Bekono :

A routing protocol determines the routes followed by packets to their destination. In networks that operate during a long time, the routing protocol might have to be changed. This change might be required to apply a security update of the protocol, to take into account significant modifications of the topology and link metrics, or to handle urgent traffic in wireless sensor network monitoring applications. The transition from one routing protocol to another involves reconfiguring all routers in a given order. It is a complex and critical task : if the transition is not performed carefully, transient routing loops can occur, and the performance of the network can drastically decrease (decreased bandwidth, increased congestion, loss of packets...). We present a loop-free transition algorithm called ACH (avoiding cycles heuristic), which is able to perform the transition in avery small number of steps.

Process and data based approaches for knowledge discovery : Application to Healthcare

Feb. 10, 2017, 2:30 p.m. - Salle du conseil
Benjamin DALMAS :

The forecast about the increase of the elderly population in French rural regions implies elderly-related healthcare processes to be more efficient. Data and process mining techniques have proven to discover useful knowledge when applied in the healthcare domain. However, it remains challenging because of the complex nature of its processes. In our work, in collaboration with the University Hospital Center of Clermont-Ferrand, we focus on the impact of choices and ordering of activities in the execution of healthcare processes.

Practical Passive Leakage-Abuse Attacks Against Symmetric Searchable Encryption

Jan. 13, 2017, 2:30 p.m. - Salle du conseil
Matthieu GIRAUD :

The problem of securely outsourcing client data with search functionality has given rise to efficient solutions called Symmetric Searchable Encryption (SSE) schemes. These schemes are provably secure with respect to an explicit leakage profile , however, determining how much information can be inferred in practice from this leakage remains difficult. First, we refine and formalize the leakage hierarchy introduced by Cash et al. in 2015. Second, we further the analysis of existing attacks to better understand their real-world efficiency and the practicality of their hypothesis. Finally, we present the first complete practical attacks on L4, L3 and L2 leakage profiles. These attacks are passive and only assume the very realistic knowledge of a small sample of plaintexts , moreover, we show their devastating effect on real-world datasets.

FCA for Software Product Lines representation

Dec. 2, 2016, 3:15 p.m. - Salle du conseil
Jessie Carbonnel (LIRMM) :
Software Product Line Engineering (SPLE) is a software engineering domain in which families of similar softwares (called products) are built reusing common artifacts. This requires to analyze commonalities and variabilities, for example to detect which parts are common to several products and which parts differ from one product to another. Such software characteristics that may be present or not in a product are called features. Several approaches in the literature exist to organize features and product configurations in terms of features. In this paper we review those approaches and show that concept lattices are a relevant structure to organize features and product configurations. We also address scaling issues related to formal context computation in the domain of SPLE.

Traitement des conflits dans des problèmes d’optimisation discrète

Dec. 2, 2016, 2:30 p.m. - Salle du conseil
Alexis CORNET :

De nombreux problèmes d’optimisation sur les graphes, comme le vertex cover connexe, l’arbre de steiner ou les dominants sont NP-complets. Cependant, l’existence d’une solution est garantie - ou il est polynomial d’en déterminer l’existence, et de calculer une solution indépendamment de la valeur de la fonction objectif. Cependant, avec l’ajout de sommets interdits conditionnels, ou conflits - des paires dont les sommets ne peuvent apparaître simultanément dans la solution - l’existence d’une solution n’est plus garantie. Dans les problèmes cités, déterminer l’existence d’une solution a été prouvé NP-complet. Cependant, ces réductions utilisent des graphes quelconques, et un grand nombre de conflits. Nous précisons ces résultats en montrant la NP-complétude des problèmes cités plus haut dans des classes de graphes plus restreintes, et en utilisant un minimum de conflits.

Apache Spark, l’étincelle du big data

Nov. 4, 2016, 3:15 p.m. - Salle du conseil
Carlos CEPEDA :

Apache Spark est un framework open source de big data : il permet d’effectuer un traitement distribué sur un cluster de plusieurs machines. Spark met facilement en place une architecture de calcul distribué et propose des interfaces de programmation en Scala, Java, Python et R. Il dispose de quelques particularités et de plusieurs fonctionnalités que l’on va aborder. Par exemple, contrairement à MapReduce de Hadoop, le calcul est réalisé à partir de données chargées dans des RDD (resilient distributed datasets) et gardées en mémoire vive. Cela limite le nombre d’entrées/sorties avec les disques durs et rend Spark généralement plus rapide que Hadoop(MapReduce). De plus, Spark peut lire plusieurs sources de données différentes : HDFS, HBase, Hive, Cassandra, Amazon S3.

Candidate Keys Enumeration of a Relational Database Schema.

Nov. 4, 2016, 2:30 p.m. - Salle du conseil
Karima ENNAOUI :

We investigate the problem of candidate keys enumeration of a relational database schema. We consider the case where the input is a relation schema, composed of pairs of attributes subsets representing the functional dependencies among attributes. The notion of candidate keys is also known as minimal generators in lattice or FCA terminologies.Lucchesi and Osborn gave in 1978 a polynomial incremental algorithm with an exponential space to enumerate all candidate keys of a relation schema. Saiedian and Spencer in 1996, introduced the notion of attribute graph of a set of functional dependencies and showed that candidate keys are union of candidate keys of strongly connected components of the attribute graph.We exploit the presence of unary functional dependencies that form a partial order over the set of attributes. We introduce the notion of key-ideal set : an ideal associated to a candidate key and distinguish the set of minimal key-ideal sets (minimal inclusion-wise). There is a one to one mapping between key-ideal sets and candidate keys. We show that the number of key-ideal sets can be exponential in the number of minimal key- ideal sets. Moreover, if there is a polynomial delay algorithm to enumerate minimal key-ideal sets then there is one for all candidate keys. We also give an incremental polynomial algorithm to enumerate all minimal key-ideal sets. As a consequence, if the number of minimal key-ideal sets is polynomial in the size of the relational schema, then there is a polynomial delay algorithm to enumerate minimal key-ideal sets.

Fouille de donnees

June 17, 2016, 2:30 p.m. - Salle du conseil
Vanel Siyou (Limos) :

Les méthodes traditionnelles de fouille de données utilisent des vecteurs d’attributs à valeurs déterministes, et ignorent l’aspect incertain des données dans la formulation du problème à traiter (classification supervisée ou non supervisée, recherche de motifs, etc.). Or les données collectées peuvent contenir des erreurs ou être incomplètes (Lindley, 2006), du simple fait des équipements de collecte. Ignorer l’incertitude des données peut engendrer des conclusions approximatives ou inexactes, d’où la nécessité de mettre en œuvre des techniques de gestion de données incertaines (Aggarwal et Yu, 2009). S’il existe des travaux récents dédiés à certaines techniques de fouille de données : indexation de données (Angiulli et Fassetti, 2012), plus proches voisins (Angiulli et Fassetti, 2013), détection d’outliers (Aggarwal et Yu, 2008), arbres de décisions (Tsang et al., 2011), séparateurs à vaste marge (Bing et Zhang, 2004), à notre connaissance, aucun n’est dédié à l’extraction de motifs séquentiels et temporels. L’objet de cette thèse consistera à explorer et à développer de nouveaux modèles d’extraction de motifs séquentiels et temporels, adaptés au contexte d’incertitude de données de la locomotion en FRM. Une des originalités de l’approche résidera dans la prise en compte de l’incertitude dans la formulation du problème (Angiulli et Fassetti, 2013), et pas simplement par la définition de métriques de similarité pour données incertaines (Aggarwal, 2007). Une autre méthode développée au LIMOS sera également examinée et adaptée au problème du FRM (Fournier et al., 2012). La compréhension des relations induites entre ces facteurs permettra la mise en œuvre de modèles optimaux de déplacement des utilisateurs du FRM.

Scheduling resource-constrained projects with transportation constraints.

June 3, 2016, 2:30 p.m. - Salle du conseil
Marina Vinot (Limos) :

In this paper, the traditional resource-constrained project scheduling problem (RCPSP) is extended with transportation constraints on the resources between activities. The assignment of resources problem and the routing scheduling problem (which involves the strategic (i.e., assignment of resources) and tactical/operational (i.e., routing scheduling) decision levels in supply chain management) are interdependent problems. Assuming that the project structure (location, duration and demand of activities) is provided it is also considered that decisions are known at the strategic level of the transportation problem. The routing of resources from a production facility to another is achieved by a heterogeneous fleet of vehicles. To solve the scheduling resource-constrained project with transportation constraints, we take advantage of a flow network model. By starting with a flow structure for the RCPSP with an insertion algorithm, we can propose a giant trip representation for the routing problem. This approach falls into the indirect split resolution scheme and a new splitting algorithm is introduced to obtain the routing solution of the specific pickup and delivery problem (PDP) which can be extended to the dial-a-ride problem (DARP). To improve this solution a local search is defined on the flow structure. A new set of instances is introduced and numerical experiments prove the efficiency of the approach. This work is the first step to solve problems with more than one resource and to minimize several objectives simultaneously. Keywords : RCPSP , Routing.

no title

May 20, 2016, 2:30 p.m. - Salle du conseil
Loukmen Regainia (Limos) :

Abstract—The last decade has witnessed Keywords—Model , UML , Security Patterns , Verification

Parameterized complexity is a recent branch of computational complexity theory.

May 6, 2016, 2:30 p.m. - Salle du conseil
Benjamin Bergougnoux (Limos)

Contrary to the classical approach, we study the complexity of a problem w.r.t. the size of the instance and a parameter (or a collection of parameter). This parameter can take many forms : the size of the solution, the size of a minimum vertex-cover, the treewidth...This allows the classification of NP-hard problems on a finer scale than in the classical setting, where the complexity of a problem is only measured by the number of bits in the input. The main goal when we study a problem is to find a parameter k such that there is an algorithm that resolves the problem in a polynomial time w.r.t. the size of the instance. We call such an algorithm fixed-parameter tractable and we say that the problem in question parameterized by k is FPT. Such an algorithm allows us to solve the problem on arbitrary large instance whenever the parameter is reasonably small. In another words, we try to hunt the difficulty of a problem by trapping it with a parameter. We will see an example of fixed-tractable algorithm that I created during my master thesis. It solves the problem Common. This problem is a global constraint of the well-know constraint satisfaction problem (CSP). Common is a constraint that takes in input two sets of variables X and Y and two sets of values N and M. This constraint is satisfied iff once assigned the number of variables of X (resp. Y) that take a value assigned to a variable in Y (resp. X) is a value in N (resp. M). This algorithm uses a variant of the well know Max Flow problem where each arc has two attributes : the capacity but also the minimum amount of flow that must pass through the arc. There are a lot of new notions, not necessary hard to understand, but I will try to make this talk easy to understand. If the time permits it, I will talk also about the exponential time hypothesis (ETH), a stronger version of P different to NP. This hypothesis states that we can not solve 3-SAT in subexponential-time. ETH allows us to find a lower bound for the time complexity of many problems. And if the time has mercy, I will also talk about the notion of kernelization. This is a recent branch of the parameterized complexity where we study the performance of the preprocessing. Here we do not want to solve the problem but to reduce the size of the instance of a problem parameterized by a parameter k such that : -The original instance is a Yes-instance iff the reduced instance is a Yes-instance. -The size of the reduced instance is bounded by a function depending only of k.

Survey of distance bounding protocols and threats

April 8, 2016, 2:30 p.m. - Salle du conseil
David Gerault (Limos) :

Contactless communications are widely used, from parking badges to electronic passports. Such technologies are subject to attacks where an adversary can impersonnate the owner of a tag to a reader by relaying the messages between them. In the last decades, several distance bounding (DB) protocols have been proposed to counter such attacks. However, they are vulnerable to new threats. In this paper, we present the relations between the threat models and, based on the analysis of more than forty existing protocols, we show the most common attack strategies and their counters. Finally, we give general guidelines to build safer distance bounding protocols.

CoLBA : a Collaborative Load Balancing Algorithm to avoid queue overflow in WSNs

March 25, 2016, 2:30 p.m. - Salle du conseil
Hamadoun Tall (Limos) :

Abstract—Most wireless sensor networks (WSNs) are used for data collection applications. WSNs are known for their ease of deployment which make them very popular. In most data collection applications, traffic is destined towards one node, usually known as the sink. With high data rate, this many-to- one traffic scenario causes queue overflow in the nodes located near the sink. Sensor nodes are known for their limited storage capacities which makes the packet queue overflow even more critical. This issue is often avoided by the routing protocol. Several efforts have recently led to the proposal of several traffic load balancing routing protocols to avoid queue overflow in WSNs. These protocols use various metrics in order to guide the routing mechanism such as the next hop neighbor queue occupancy rate or the next hop current number of children. In this paper, we propose a new routing protocol based on a Collaborative Load Balancing Algorithm (CoLBA). CoLBA uses the time spent by data packets in the queues as a routing metric. In addition to this metric, each node is aware of its queue occupancy rate. In case the receiving node queue is close to be full, all transmitting neighbors will be alerted and asked to choose another next hop. Simulation results on Contiki OS Cooja simulator showed that CoLBA achieves a better packet delivery rate and much less queue overflow compared to a standard routing protocol based on minimal hop counts.

Preuve à divulgation nulle de connaissance

March 11, 2016, 2:30 p.m. - Salle du conseil
Xavier Bultel (Limos) :

-J’ai résolu ton problème, mais je ne veux pas te montrer la solution...  Ha, pas de preuve, pas de chocolat !' Les preuves à divulgation nulle de connaissance permettent de prouver la connaissance d’une solution à une instance d’un problème difficile sans révéler la moindre information sur cette solution. Par exemple, le sudoku est un problème NP. À un concours de sudoku, vous résolvez une grille particulièrement difficile. Vous voulez prouver au jury que vous avez réussi, mais vous craignez que celui-ci soit corrompu et vole votre solution... Que faire ? On s’intéressera d’abord aux preuves de connaissance sur des problèmes classiques de graphes (isomorphisme, k-coloration...), puis on s’intéressera aux cas des problèmes cryptographiques et aux implications de ces preuves.