With the sharp increase in the number of malicious software samples,in order to reduce the workload of man-ual traceability,the importance of malware homology analysis has never been more critical.However,when attackers re-use malicious codes,it is necessary to set up a specific compilation environment for different attack scenarios.This diver-sity in compilation environments leads to significant variations in the syntax and structure of homologous binaries,thus compromising the accuracy of malware homology analysis.To solve this problem,we implement an accurate,unsuper-vised and efficient malware homology scheme by analyzing the impact of compilation environments on binary generation.We adopt the binary promotion and re-optimization technologies to unify binaries to the same intermediate representation layer,which eliminates the syntax and structural changes to a certain extent.Aiming at the insufficiency of the traditional Continuous Bag of Words(CBOW)model in token semantics learning,an instruction-level contextual semantics learning scheme is proposed.And considering the small probability events of context-independent instructions,we use the Smooth Inverse Frequency(SIF)model to calculate feature vectors of basic blocks.In addition,in view of the fact that library functions and strings in malwares contain richer sensitive information,we propose an establishment algorithm of the initial matching set of basic blocks,which further improves the accuracy of malware homology analysis based on K-Hop greedy matching algorithm and linear matching algorithm.Experimental results demonstrate the effectiveness of our solution.When applied to the open-source malware Mirai,compared with the existing unsupervised model and pre-trained model,this solution has better overall performance in terms of analysis accuracy and running cost.At the same time,for various other types of malwares,the homology indexes output by this scheme are all higher than the homology judgment threshold we set,further validating its utility in the field of malware homology analysis.