Skip to Main content Skip to Navigation
Conference papers

Metadata Management on Data Processing in Data Lakes

Abstract : Data Lake (DL) is known as a Big Data analysis solution. A data lake stores not only data but also the processes that were carried out on these data. It is commonly agreed that data preparation/transformation takes most of the data analyst's time. To improve the efficiency of data processing in a DL, we propose a framework which includes a metadata model and algebraic transformation operations. The metadata model ensures the findability, accessibility, interoperability and reusability of data processes as well as data lineage of processes. Moreover, each process is described through a set of coarse-grained data transforming operations which can be applied to different types of datasets. We illustrate and validate our proposal with a real medical use case implementation.
Document type :
Conference papers
Complete list of metadata
Contributor : Yan Zhao <>
Submitted on : Monday, February 22, 2021 - 9:14:27 AM
Last modification on : Wednesday, June 9, 2021 - 10:00:32 AM
Long-term archiving on: : Sunday, May 23, 2021 - 6:05:55 PM


 Restricted access
To satisfy the distribution rights of the publisher, the document is embargoed until : 2021-07-11

Please log in to resquest access to the document



Imen Megdiche, Franck Ravat, Yan Zhao. Metadata Management on Data Processing in Data Lakes. 47th International Conference on Current Trends in Theory and Practice of Informatics (SOFSEM 2021), Jan 2021, Bozen-Bolzano, Italy. pp.553-562, ⟨10.1007/978-3-030-67731-2_40⟩. ⟨hal-03141202⟩



Record views