Recently,post-quantum cryptographic algorithms have become a popular research topic in the field of security owing to their resistance to quantum attacks.The lattice-based Falcon digital signature algorithm is one of the first four post-quantum cryptographic standard algorithms published by NIST.Key tree generation is the core component of the Falcon algorithm,which requires more time and consumes more resources during actual operation.Therefore,this study proposes a GPU-based parallel key tree generation scheme for Falcon that uses the Single Instruction Multiple Threads(SIMT)parallel mode with joint control of parity threads and the direct computation mode without intermediate variables to achieve speedup and reduce resource consumption.Experiments are conducted on a Python-based CUDA platform to verify the accuracy of the results.Falcon key tree generation for the RTX 3060 Laptop has a latency of 6 ms and a throughput rate of 167 times/s.It achieves a 1.17 acceleration ratio relative to the CPU when computing a single Falcon key tree generating part,whereas the GPU achieves an approximately 56 acceleration ratio relative to the CPU when 1 024 Falcon key tree generating parts are generated simultaneously;the throughput rate is 32 times/s on the embedded Jetson Xavier NX platform.