Building Method for Data Lineage Based on Data Table Similarity Calculation
In the era of big data,it has become a consensus that various business departments can stimulate data value based on the accumulation of existing business data.However,due to the lack of unified data standards across different business systems,disorganized metadata,data silos,and low-quality data problems constantly emerge,hindering the effective utilization of data and necessitating necessary governance.Among them,data lineage analysis is one of the key tasks of metadata management,which is of great significance for data traceability and data governance.However,traditional methods for constructing data lineage often face high computational complexity,poor accuracy,and high execution costs.To overcome these issues,a data lineage construction method based on the similarity calculation of data tables is proposed:by text feature representation of the three elements of data table naming,table structure,and data fields,using TFIDF to calculate the similarity of data tables,and further constructing the data table lineage relationship through the improved Jaro-Winkler Distances algorithm to verify the field overlap and table name similarity.The results show that the algorithm has a significant effect on the construction of data table lineage,facilitating the smooth progress of data governance work.
data lineagedata governancemetadatatable similarity