Metadata Management on Data Processing in Data Lakes - Systèmes d’Informations Généralisées Accéder directement au contenu
Communication Dans Un Congrès Année : 2021

Metadata Management on Data Processing in Data Lakes

Résumé

Data Lake (DL) is known as a Big Data analysis solution. A data lake stores not only data but also the processes that were carried out on these data. It is commonly agreed that data preparation/transformation takes most of the data analyst's time. To improve the efficiency of data processing in a DL, we propose a framework which includes a metadata model and algebraic transformation operations. The metadata model ensures the findability, accessibility, interoperability and reusability of data processes as well as data lineage of processes. Moreover, each process is described through a set of coarse-grained data transforming operations which can be applied to different types of datasets. We illustrate and validate our proposal with a real medical use case implementation.
Fichier principal
Vignette du fichier
Data_Lake__Metadata_Management_on_Data_Processing___short_paper_SOFSEM.pdf (989.64 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03141202 , version 1 (22-02-2021)

Identifiants

Citer

Imen Megdiche, Franck Ravat, Yan Zhao. Metadata Management on Data Processing in Data Lakes. 47th International Conference on Current Trends in Theory and Practice of Informatics (SOFSEM 2021), Jan 2021, Bozen-Bolzano, Italy. pp.553-562, ⟨10.1007/978-3-030-67731-2_40⟩. ⟨hal-03141202⟩
224 Consultations
196 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More