Design and Implementation of a Cloud Native Data Lake Platform
Cloud native data lakes have become a research hotspot in the field of data management and analytics,and related technologies and applications have received widespread attention and exploration.Data lake deployment suffers from high cost and poor compatibility between components,the lack of separation of storage and computation restricts the extensibility of the data lake platform,and the lack of a complete data entry system easily causes the formation of a data lake swamp,resulting in users being unable to extract data value from it.We design and implement a cloud native data lake service platform,which uses Kubernetes as the underlying layer to build a cloud native environment,and combines container technology to mirror data lake components.Meanwhile,the storage and computing separation scheme of the data lake is designed to improve the scalability and portability of the data lake platform,and the image is containerized with the monitoring and assembly production line to realize cloud operations on the data lake.The platform also establishes a bridge between the user's entry operations and the cloud native computing engine,pre-processes the entry information,provides multiple types of operations to meet diverse entry scenarios,and writes data to the data lake in a unified catalog manner.The actual operation results show that the platform not only improves the flexibility and reliability of the data lake platform,but also ensures the normative storage of metadata and data assets.
cloud nativedata lakebig dataproduction linedata lake on cloud