Home »

Proposition de stage - 2023

Diving into neural language models for improving discourse analysis tasks

Niveau : Master

Période : 6 mois (à partir de janvier 2023)


Fine-tuning a pre-trained language model has become the de facto standard for handling natural language processing tasks. Since many of these tasks are dealing with discourse and dialogue structures (e.g. conversational agent, summarization, dialogue acts recognition, argumentation mining), it is crucial to understand how such information is captured by the language models and to study how to intervene on the learning of this type of information: what is learned, what is missing, how to add it, how to keep the useful information in a fine-tuned, distilled, pruned or quantized model…

The internship mission will be defined in this context, collaboratively with the candidate. One possibility would be to start by probing the language models on discourse analysis tasks.

We wish the successful candidate to pursue a PhD on the subject in the Lexhnology project.
  • A. Rogers, O. Kovaleva, and A. Rumshisky. A Primer in BERTology: What We Know About How BERT Works. Transactions of the Association for Computational Linguistics (TACL), 8:842–866.  2020.
  • V. Araujo, A. Villa, M. Mendoza, M.-F. Moens, and A. Soto, “Augmenting BERT-style Models with Predictive Coding to Improve Discourse-level Representations,” In EMNLP, Nov. 2021.
  • M. Lukasik, B. Dadachev, G. Simões, & K. Papineni, Text Segmentation by Cross Segment Attention, In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 4707–4716, November 16–20, 2020.
  • L. Huber, C. Memmadi, M. Dargnat, and Y. Toussaint.  Do sentence embeddings capture discourse properties of sentences from scientific abstracts ? In the First ACL Workshop on Computational Approaches to Discourse, 86–95, 2020.
  • F. Koto, J. H. Lau, and T. Baldwin. Discourse Probing of Pretrained Language Models. In Proceedings of the 20th Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), Mexico (virtual), 2021

The Lexhnology project

Lexhnology is a project funded by the French National Agency (ANR). It will start on January 2, 2023 for a period of 42 months.

Given the growing extraterritoriality of American law, this domestic law is increasingly impacting other countries’ jurisdiction. It is of prime importance that second-language (L2) users of legal English be able to analyze case law. Teaching the argumentative structure to L2 learners is a widely accepted method in languages for specific purposes (LSP) L2 teaching/learning and may help learners understand the legally-binding rationale behind judicial decisions.
Despite this context, consensus about the linguistic definition of the communicative functions, also known as moves, in case law does not yet exist. In addition, no Natural Language Processing (NLP) techniques are currently able to automatically identify moves in case law. Finally, the effectiveness of making moves explicit to L2 learners has not been measured experimentally.

To answer these questions, Lexhnology will take an innovative interdisciplinary approach – linguistic, NLP, LSP teaching/learning. The project is the joint collaboration of four laboratories, namely LS2N, CRINI, LAIRDIL and ATILF.


The successful candidate is expected to:

  • Have/Prepare a Master Degree (or equivalent) in Natural Language Processing, Computer Sciences, Computational Linguistics or Data sciences,
  • Have a excellent background in deep learning and more generally machine learning,
  • Have strong programming skills (software dev. and python)
  • Have good verbal communication and writing skills (in French/English)
  • Have facility with teamwork as well as working autonomously
  • Be dynamic and curious

We look forward to receiving your meaningful online application including:

  • a letter of motivation
  • a CV
  • contacts for two referees

Partager ce contenu
Copyright : LS2N 2017 - Mentions Légales -