Area of expertise | Laboratoire des Sciences du Numérique de Nantes

SDD - Data Science and Decision-making

Faced with ever increasing data volumetrics and their increasing complexity, classification models and research algorithms are at the heart of the « Data Science and Decision-Making » topic.

Coordinator : Patricia SERRANO ALVARADO

Teams : COMBI, DUKe, GDD, MéForBio, MODELIS, TASC

The links between statistical data processing and optimization have a long history; the importance of interdisciplinarity was underlined since the creation of the French Operational Research Society at the end of the 1950s by its first president who was a recognized statistician⁽¹⁾. Under sometimes different names, these links are now experiencing a new impetus, stimulated on the one hand by the confrontation of data management and processing specialists with ever-increasing volumes that require efficient algorithms, and on the other hand by the growing need to refine optimization models by integrating increasingly rich knowledge and to guide research processes in areas of increasing complexity. Publications ⁽²⁾ and interdisciplinary workshops ⁽³⁾ establish future directions for collaboration between data mining, learning, and combinatorial optimization. The relationships between these disciplines and bioinformatics are intrinsically linked to the development of the latter⁽⁴⁾ and are now being renewed with the considerable changes in the scales of analyzable data associated with "omics" technologies. based on the skills and projects of the teams making up the SDS cluster, the following four orientations can be defined in particular, which transcend the particularities of the data processed in the various teams.

Enrichment of models: at their construction, by learning specifications (e.g. constraints, objectives, preferences) from histories and taking into account hazard (uncertainty models, probabilities); in an evolutionary process, by integrating knowledge from the processing of activity data (e.g. traces, sensor data).
Learning of resolution strategies: static and dynamic analysis of a research space and its evolution to guide off-line or on-line research; learning of parameters and automatic generation of tactics with the long-term objective of "autonomous research".
Analysis of search and learning algorithms: improving the efficiency of classification and learning algorithms for metrics and structures; introduction of specifications in human in the loop search processes.
Addition of optimization and digging features: visualization of traces of optimization heuristics and data mining observations.

Beyond the improvement of the approaches of the respective communities, fundamental questions arise transversely on the consideration in the modelling of the systems studied of the different observation levels: what are the relationships between the different observation levels? And how to build models that are "coherent" at different scales?

Bibliography

⁽¹⁾ D. Bayard (2008). Entretiens avec G.-Th. Guilbaud, Mathématiques et Sciences Humaines, n° 183, pp. 35-53.

⁽²⁾ D. Corne, C. Dhaenes, L. Jourdan (2012). Synergies between operations research and data mining – The emerging use of multi-objective approaches, European J. of Operational Research, vol. 221, n°3, pp. 469-479. M. Milano, P. Van Hentenryck (2014). Looking into the crystal-ball : a bright future for CP, Constraints, vol. 19, pp. 121-125.

⁽³⁾ L. De Raedt, S. Nijssen, B. O'Sullivan, M. Sebag (2014). Constraints, Optimization and Data, Report from Dagstuhl Seminar 1441.

⁽⁴⁾ L.J. Jensen, A. Bateman (2011). The rise and fall of supervised machine learning techniques, Bioinformatics, vol. 27, n°24, pp. 3331-3332. P. Barahona, L. Krippahl, O. Perriquet (2011). Bioinformatics : a challenge to constraint programming, Hybrid Optimization, Springer Optimization and its Applications, vol. 45, pp. 463-487.