HDRs 2023
Florian Boudin, Analysing and indexing scientific texts ►
Keywords : Information retrievalNatural language processingKeyword indexingScientific textsGraph-based methodsEvaluationScientific writing assistance
Abstract
The work presented in this "Habilitation à Diriger des Recherches" (Accreditation to Supervise Research) focuses on the analysis and indexing of scientific texts and lies at the intersection of two research themes: Natural Language Processing (NLP), which involves the analysis, understanding, and generation of natural language, and Information Retrieval (IR), which studies ways to retrieve information from a collection of documents. We are interested in the question of scholarly document retrieval, which involves searching for documents in the scientific literature (e.g., articles, books, theses) related to a specific subject of study. More specifically, our research aims to enhance the metadata associated with documents to improve their accessibility and dissemination. Our work focuses on the development of automated methods for keyword generation, which are characterized by the unique utilization of graph-based techniques and node ranking algorithms. We delve into the issue of indirectly evaluating automatically generated keywords through application-specific tasks and their utilization in search engines and academic recommendation systems. We present our efforts into constructing linguistic resources, developing software tools, and their dissemination within the scientific community. Finally, we conclude with some prospective insights into keyword indexing and, more broadly, the emerging research at the intersection of NLP and IR themes.
Defense date : 20-06-2023
Jury president : Aurélie Névéol
Jury :
- Aurélie Névéol
- Antoine Doucet
- Jacques Savoy
- Béatrice Daille
- Richard Dufour