Séminaire invité de l’équipe STACK : Suren BYNA (Lawrence Berkeley National Lab)

1 octobre 2019 @ 14 h 00 min - 16 h 00 min

The STACK team is receiving Suren Byna from Lawrence Berkeley National Lab (LBNL). He will deliver a talk on Tuesday, October 1 in room A002 at 2:00pm.
Title: Proactive Data Containers (PDC): An Intelligent Object-centric Data Management System for HPC
This presentation is on a novel user-level object-centric data management system, called Proactive Data Containers (PDC), which provides abstractions and storage mechanisms that take advantage of deep memory and storage hierarchy, enable proactive automated performance tuning in storing and retrieving data, and perform user-defined analytics in the data path on large-scale supercomputing systems. While cloud computing environments have been successfully using object-based storage, such as Amazon S3 and OpenStack Swift, parallel file systems on large-scale supercomputing systems are accessed using I/O libraries that are based on slow and restrictive POSIX and MPI (Message Passing Interface) I/O standards. These file systems face fundamental challenges in the areas of scalable metadata operations, semantics-based data movement performance tuning, and asynchronous operation. Exacerbating this situation, storage systems on upcoming exascale supercomputers are being deployed with an unprecedented level of complexity due to a deep system memory/storage hierarchy based architectures. This hierarchy ranges from several levels of volatile memory to non-volatile memory, traditional hard disks and tapes. Simple and efficient methods of data management and movement through this hierarchy is critical for numerous scientific applications that are storing and analyzing massive amounts of data on supercomputing systems. In the PDC system, scalable metadata management is achieved using the memory available in compute nodes. The metadata objects contain required information such as data description and ownership as well as optional provenance and user-defined tags. Data objects are stored and retrieved efficiently using a server-initiated optimized data movement, where data is stored or cached asynchronously, while the applications continue with computation operations. I will also discuss automatic data analysis and transformations while the data is moving from one location to another. I will present the PDC concepts of automatic reorganization and placement of data in the memory and storage hierarchy, closer to data analysis using the history of previous data accesses for analysis and of any user-provided hints.
Short bio:
Suren Byna is a Staff Scientist in the Scientific Data Management (SDM) Group in CRD @ LBNL. His research interests are in scalable scientific data management. More specifically, he works on optimizing parallel I/O and on developing systems for managing scientific data. He is the PI of the ECP funded ExaHDF5 project, and USA Department Energy’s Office of Science funded object-centric data management systems (Proactive Data Containers – PDC) and experimental and observational data management (EOD-HDF5) projects.


IMT Atlantique
4 Rue Alfred Kastler
Nantes, 44300
