Proposition de stage - 2023
Combining machine learning techniques for “Regulation of Artificial Intelligence” ontology building
Niveau : Master 2
Période : 5 mois
Research laboratories: DUKe/GDD, LS2N (www.ls2n.fr) & “Law and Social Changes” laboratory(dcs.univ-nantes.fr)
Margo Bernelin (Margo.Bernelin@univ-nantes.fr), “Law and Social Changes” laboratory
Mounira Harzallah (firstname.lastname@example.org), DUKe, LS2N
Patricia Serrano Alvarado (Patricia.Serrano-Alvarado@univ-nantes.fr), GDD, LS2N
The master internship will be carried out in the context of the MétaDroit project (metadata and ontologies in the field of law) within a collaboration between the team InnovSanté of the
“Law and Social Changes” laboratory (dcs.univ-nantes.fr/) and the teams DUKe and GDD of LS2N.
In the project, we are interested in ontology building from texts for the “Regulation of Artificial Intelligence” domain. Indeed, the regulation of Artificial Intelligence is emerging in Law prompting legal discussions. Currently, there isn’t an ontology that formalises the knowledge of the domain and allows us for example to identify high-risk AI systems and infer for each one a set rules/constraints that should be respected or a set of horizontal obligations on its providers.
Recently, the European Commission has elaborated three texts (i.e. corpus) that aim at regulating all dimensions of AI: from conception, to marketing and liability (eur-lex.europa.eu/legal-content/FR/ALL/?uri=CELEX%3A52021PC0206). We seek to
process the corpus and test on it several machine learning approaches for ontology learning and then to compare the performance of these approaches in order to propose a combined one for building an ontology for the “Regulation of AI”.
The popularity of ontologies and the easy access to a large number of textual resources have strongly motivated the automatic construction of ontologies using artificial intelligence
techniques. Two types of construction approaches are distinguished: pattern-based approaches and distributional approaches and [Xu et al., 2019, Chen et al. 2020]. Pattern-based approaches show quite high precision whereas their recall is low because of the large variability in natural language for expressing a meaning. Distributional approaches can be supervised or unsupervised. Supervised approaches perform well, however they are rather sensitive to the distribution of training dataset, making question able their reliability. In addition, training dataset building is time-consuming. Unsupervised approaches and more specifically clustering based approaches don’t require a training dataset and allow to consider a large amount of data. However, they face two main difficulties: the cluster labeling and the formation of semantically consistent clusters relevant to the ontology domain. Though these approaches seem to be complementary, there has been rather little work on integrating them.
Master internship purpose
The purpose of the master internship is to compare and combine four approaches for the
semi-automatic construction of an ontology for the “Regulation of AI” domain. The work
will be focused on the extraction of two ontology components: concept and hypernym relationship. Four approaches will be compared: Core concept seeded-LDA [Huang et al. 2021], similarity measure based approach [Albukhitan, 2017],
word-embedding-based-supervised approach[Mikolov et al. 2013], and hypernym-pattern-based approach [Alaa Aldine et al. 2021]. The combined approaches will be a core ontology driven approach and its results will be enriched by knowledge from knowledge graphs, such as DBpedia.
1. State of the art : core-concept seeded LDA, pattern-based approach, Combined approach.
2. Building a benchmark (a gold standard ontology) from the “Regulation of AI” corpus.
3. Testing four approaches for ontology learning and evaluation of their results.
4. Proposition of a combined approach for learning an ontology for the domain of “Regulation of AI”.
5. Implementation and validation of the combined approach on the “Regulation of AI” corpus.
Candidate Profile : Knowledge on Data mining/Machine Learning, Knowledge on semantic web and NLP will be strongly appreciated but not mandatory, Knowledge in programing
languages mainly Python.
To apply send your CV, a cover letter and your transcript of records of the tree last years to email@example.com, firstname.lastname@example.org and email@example.com