现代计算机2024,Vol.30Issue(11) :91-95.DOI:10.3969/j.issn.1007-1423.2024.11.016

基于ClickHouse的实时数据仓库的基础架构研究

Research on the infrastructure of real-time data warehouse based on ClickHouse

蒋雷 白伟丽 李小红
现代计算机2024,Vol.30Issue(11) :91-95.DOI:10.3969/j.issn.1007-1423.2024.11.016

基于ClickHouse的实时数据仓库的基础架构研究

Research on the infrastructure of real-time data warehouse based on ClickHouse

蒋雷 1白伟丽 1李小红1
扫码查看

作者信息

  • 1. 广东白云学院大数据与计算机学院,广州 510450
  • 折叠

摘要

随着移动互联网技术的进步,用户对网购参与程度的提高,电商企业每天、甚至每小时都在收获大量用户行为日志和业务数据,传统实时计算系统已无法满足对这些日志和业务数据进行在线分析和实时性统计.在该项需求的启发下,基于分层设计理念对用户行为数据实时处理的基础型架构进行研究,以期在面对大量实时计算时,通过沉淀中间结果的方式提高计算复用性,降低开发成本.该架构采用实时分析型列式数据库ClickHouse和Flink实时流式处理框架作为核心技术,通过实时计算获得天级、分钟级、秒级甚至亚秒级数据,便于企业对业务进行快速反应和调整,满足新时代下的实时计算需求.

Abstract

With the advancement of mobile internet technology and the increasing participation of users in online shopping,e-commerce enterprises are harvesting a large amount of user behavior logs and business data every day,even every hour.Tradi-tional real-time computing systems are no longer able to meet the online analysis and real-time statistics of these logs and business data.Inspired by this requirement,a basic architecture for real-time processing of user behavior data is studied based on the lay-ered design concept,aiming to improve computational reusability and reduce development costs by precipitating intermediate re-sults when facing a large amount of real-time computing.This architecture adopts the real-time analytical columnar database Click-House and Flink real-time streaming processing framework as the core technologies,and obtains data at the day,minute,second,and even sub second levels through real-time computing,making it easy for enterprises to quickly respond and adjust to business,and meeting the real-time computing needs of the new era.

关键词

ClickHouse/实时计算架构/Flink/大数据

Key words

ClickHouse/real-time computing architecture/Flink/big data

引用本文复制引用

基金项目

广东省教育厅质量工程项目(CXQX-JY202101)

出版年

2024
现代计算机
中大控股

现代计算机

影响因子:0.292
ISSN:1007-1423
段落导航相关论文