An efficient parallel computing method based on the characteristics of the high-performance heterogeneous accelerator and the training mode of MiniGo was proposed.The on-chip computing resources were reasonably planned to achieve pipelining parallel optimization between heterogeneous devices.The shared memory programming was designed according to the existence of shared storage segments between heterogeneous devices to reduce data transmission costs.According to the characteristics of multiple computing resources in a digital signal processing cluster,combined with the computing-memory access feature of the operators,different optimization strategies were designed.At the same time,this method provides an easy-use high-performance operator library for TensorFlow.The experimental results show that this method realizes the multi-core parallel computing of operators.The speedup of convolution was 24.69 compared with that was achieved on a single core.Compared with the cropped version of the 8-core FT2000 + CPU,the speedup of training and self-play execution on this method were 3.83 and 1.5,respectively.