Privacy-Preserving Record Linkage Method Based on Variable-Length Coding and Sliding Window
Privacy-Preserving Record Linkage(PPRL)refers to the efficient identification of records corresponding to the same entity object across different databases without revealing sensitive or confidential information represented by the records.Bloom Filter(BF)is a widely used technique in PPRL,which encodes sensitive information in records and uses q-gram for approximate matching.However,BF encoding is vulnerable to cryptanalysis attacks,and its insensitivity to the q-gram position can result in a decrease in the precision of record matching.This study proposes a PPRL method based on variable-length coding and sliding window techniques.The method for generating the variable-length encoding record used in the method not only makes the record position-sensitive but also hides the frequency information of entity bit arrays by adding random bit arrays before and after the effective bits.This effectively defends against frequency attacks.In addition,a record linkage method based on sliding windows is designed,which first filters out a large number of non-matching records through a fast filter and then uses a bidirectional sliding window exact-matching strategy to match the remaining records.This improves the matching efficiency of the privacy-preserving records.The experimental results on public datasets show that the proposed method is approximately 100 times faster in encoding the speed than the BF method and has higher matching accuracy.It also has stronger security in cross-database PPRL.