Big Data Quality Control Architecture and Cleaning Method in Transportation Field
In order to improve the data quality of integrated transportation big data center and support data application,a quality control architecture is proposed which based on ISACA data governance ideas.The quality control architecture includes the guide of data quali-ty check,dirty data identification method and data cleaning governance method.The data quality check guide summarized from six di-mensions:data standardization,data integrity,data consistency,data accuracy,data timeliness and data accessibility,and there is a methodology which composed of manual and automatic to identify and clean dirty data.Through the verification of rail transit card swi-ping data in Chengdu,11 data quality problems such as non-standard ticket card types and untimely data transmission in the card swi-ping data table were identified by consensus.90.9% data quality problems were corrected by means of both technology and manage-ment,and the subsequent incoming data met the requirements.